1 Introduction

1.1 Background

Models of infectious disease transmission have, from relatively modest beginnings (e.g. Bailey 1957), developed a rich domain of applicability covering the whole spectrum of human, animal and plant pathogens, and informing the study of questions from viral evolution, through epidemiology of infectious diseases, to public health policy (see Heesterbeek et al. 2015). Increasingly, networks have been seen as a way of modelling the complex, heterogeneous patterns of contacts between individuals (Danon et al. 2011).

In theoretical studies, the configuration model has been a popular choice due to the ability to specify the number of contacts each individual has that are capable of spreading disease, while allowing for analytic results to be obtained (e.g. Molloy and Reed 1995; Newman 2002). Ball and Neal (2008) used an effective degree approach (which we describe in Sect. 2.2 below—cf. Lindquist et al. 2010) to derive a system of ordinary differential equations that describes the deterministic limit of the epidemic model as the population size \(N \rightarrow \infty \). A much simpler (equivalent) system of only 4 ordinary differential equations was obtained by Volz (2008) and subsequently shown by Miller (2011), Miller et al. (2012) to be essentially one-dimensional (the 4 ODEs were also shown by House and Keeling (2010) to be a special case of the much higher dimensional pair approximation model of Eames and Keeling (2002), in which the degree structure is explicit). Fully rigorous proofs of convergence in probability of the scaled stochastic model to the deterministic limit are given by Decreusefond et al. (2012), Bohman and Picollelli (2012), Barbour and Reinert (2013) and Janson et al. (2014). These works are primarily concerned with the temporal behaviour of proportions of the population in different epidemiological compartments (susceptible, infectious and removed) over the main body of a large epidemic. Here, we are also concerned with temporal behaviour, but focus on numbers infected early in the epidemic, including the possibility of early stochastic extinction.

In a recent paper, Graham and House (2014) use a pairwise approximation in conjunction with the central limit theorem for density dependent population processes (Ethier and Kurtz 1986, Chap. 11) to obtain a closed-form approximation to the mean and variance of prevalence in the linearised model which approximates the early asymptotic exponential growth phase of a Markovian SIR epidemic on a configuration network. In particular, they find that, under these approximations, the variance in disease prevalence is determined by the first three moments of the network degree distribution. In this paper, we use the effective degree approach of Ball and Neal (2008) to approximate the early stages of the epidemic by a continuous-time, multitype Markovian branching process, which is then analysed in detail. For \(t\ge 0\), let Z(t) denote the total number of individuals alive in this branching process at time t, so Z(t) approximates disease prevalence in the epidemic model during its early asymptotic exponential growth phase. Explicit closed-form expressions are derived for the mean and variance of Z(t), the covariance of Z(t) and Z(s) to give the behaviour over time, and also for the probability of extinction \(\pi (t) = \mathbb {P}(Z(t)=0)\). As in Graham and House (2014), the mean and variance in disease prevalence depends on the degree distribution only through its first two and three moments, respectively.

The results in Graham and House (2014) assume implicitly that the initial number of infectives is sufficiently large for the density dependent population process central limit theorem to yield a good approximation. In contrast, our results assume any arbitrary, but specified, initial number of infectives. The asymptotic distribution of types in the branching process, when it does not go extinct, is also available in closed-form and enables us to obtain a Gaussian process approximation, with explicit mean and covariance function, for the prevalence in the early asymptotic exponential growth phase of an SIR epidemic, with few initial infectives, which takes off and becomes established. We show that this approximation can be applied together with the methods of Ross et al. (2006) to estimate epidemiological parameters from early prevalence data of a simulated epidemic provided the first three moments of the degree distribution are known.

1.2 Outline of the paper

The paper is organised as follows. The configuration network model and a Markov SIR epidemic on that network are described in Sect. 2.1. The effective degree construction of this epidemic is outlined in Sect. 2.2. Approximation of the early stages of this epidemic by a branching process is outlined in Sect. 2.3, where conditions are given for the mean, variance and covariance functions of the number of infectives in the epidemic process to converge to the corresponding quantities of the approximating branching process as the population size tend to infinity. The representation of the approximating branching process as a continuous-time, multitype Markov branching process is outlined in Sect. 2.3 and described more explicitly in Sect. 2.4. The mean, variance and covariance functions of the total number of individuals alive in the branching process are considered in Sects. 34 and 5, respectively. Explicit closed-form expressions are obtained for each of these quantities and for their limits as time \(t \rightarrow \infty \). The arguments in Sects. 34 and 5 assume that underlying degree distribution has a maximum degree. In Sect. 6, we show that these expressions continue to hold in the unbounded degree setting, subject to the degree distribution satisfying suitable moment conditions. The probability that the branching process is extinct at time t is studied in Sect. 7. Closed-form expressions for this probability, given the initial state of the branching process, are not available so asymptotic results as \(t \rightarrow 0\) and \(t \rightarrow \infty \) are considered.

The mean, variance and covariance functions derived in Sects. 34 and 5 are unconditional, so they include realisations of the branching process which result in extinction. However, in the epidemic setting, we are often interested in analysing the behaviour of epidemics that take off and become established, which correspond to non-extinction of the branching process. In Sect. 8, we first derive the mean and variance of the total number of individuals alive in the branching process at time t, conditional upon the process having survived to time t; fully closed-form results are not available owing to the absence of a closed-form expression for the survival probability. We then consider realisations of the branching process which reach some specified size, K say, with time being set to zero the first time the total number of individuals alive is K. The results in Sect. 3 yield an explicit expression for the asymptotic distribution of types, given that the branching process does not go extinct, which, provided K is sufficiently large, enables the above branching process starting from K individuals to be approximated by a Gaussian process whose mean and covariance functions are determined explicitly. The theory is illustrated by numerical examples of both forward simulation and inference in Sect. 9 and some concluding comments are given in Sect. 10.

In general, we define notation as it is introduced; we also collect notation that is used in multiple sections in Table 1.

Table 1 Notation used in multiple sections of this paper

2 Model and approximating branching process

2.1 Model

We consider the spread of an SIR epidemic on a network of N individuals, labelled \(1,2,\ldots ,N\), constructed using the configuration model as follows (see e.g. Newman 2002). Let D be a random variable which describes the degree of a typical individual and let \(p_k=\mathbb {P}(D=k)\) \((k=0,1,\ldots )\). Let \(D_1,D_2,\ldots ,D_N\) be independent realisations of D and, for \(i=1,2,\ldots , N\), attach \(D_i\) stubs (half-edges) to individual i. Pair up these stubs uniformly at random to form the edges in the network. If \(D_1+D_2+\cdots +D_N\) is odd, there will be a left-over stub, which is ignored; the resulting network may have other ‘defects’ such as self-loops and multiple edges between pairs of individuals but, provided that D has finite variance, such imperfections become sparse in the network as \(N \rightarrow \infty \) (see e.g. Durrett 2007, Theorem 3.1.2). An alternative to the degrees \(D_1,D_2,\ldots ,D_N\) being random is, for each \(N=1,2,\cdots \), to replace \( \mathbf {D} =(D_1,D_2,\ldots ,D_N)\) by \( \mathbf {D} ^{(N)}=\left( D_1^{(N)},D_2^{(N)},\ldots ,D_N^{(N)}\right) \), where the degree sequences \( \mathbf {D} ^{(N)}\) \((N=1,2,\ldots )\) are prescribed and satisfy \(p_k^{(N)}=N^{-1} \sum _{i=1}^N \delta _{k,D_i^{(N)}} \rightarrow p_k\) as \(N \rightarrow \infty \) \((k=0,1,\ldots )\), where the Kronecker delta \(\delta _{k,j}\) is 1 if \(k=j\) and 0 otherwise (see e.g. Molloy and Reed 1995).

The epidemic is defined as follows. Initially some individuals are infective and the remaining individuals are susceptible. Infective individuals have independent infectious periods, each having an exponential distribution with rate \(\gamma \) (and hence mean \(\gamma ^{-1}\)), after which they become recovered and play no further role in the epidemic. Throughout its infectious period, an infective contacts each of its susceptible neighbours in the network at the points of independent Poisson processes having rate \(\tau \), so the probability that a given infective contacts a given neighbour before the infective recovers is \(\tau /(\gamma +\tau )\). Any contacted susceptible immediately becomes infective and may transmit the infection to any of its neighbouring susceptibles; i.e. there is no latent period. All the infectious periods and Poisson processes governing transmission of infection are mutually independent. The epidemic ends as soon as there is no infective present in the network.

2.2 The effective degree model

Ball and Neal (2008) introduced an ‘effective-degree’ construction of the above epidemic, in which the network is constructed as the epidemic progresses. The process starts with some individuals infective and the remaining individuals susceptible, but with none of the stubs paired up. For \(i=1,2,\ldots ,N\), the effective degree of individual i is initially \(D_i\). Infected individuals transmit infection by pairing their stubs with stubs attached to susceptible individuals in the following fashion. An infected individual makes infectious contacts down its unpaired stubs independently at rate \(\tau \) and is removed at rate \(\gamma \). When an infective, individual i say, transmits infection down a stub that stub is paired with a stub (attached to individual j, say) chosen independently and uniformly at random from all the unpaired stubs, to form an edge. The effective degrees of individuals i and j are both reduced by 1. If \(i=j\) then the effective degree of individual i is reduced by 2 but this will not significantly affect the dynamics for large populations since the probability of it happening is \(O(N^{-1}\)). If individual j is susceptible then it becomes infective and can transmit infection down any of its unattached stubs. As before, the epidemic ends as soon as there is no infective present. The network is then typically only partially constructed but that does not matter if interest is focussed on properties of the epidemic. In the original formulation of Ball and Neal (2008), when an infective recovers its unpaired stubs, if any, were paired with stubs chosen uniformly at random without replacement from the set of unpaired stubs but that is unnecessary; the stubs from such an infective can simply be left in the set of unpaired stubs.

2.3 Approximating multitype branching process

Suppose that the size N of the network is large and the initial number of infectives is small. Then during the early stages of an epidemic it is very likely that each time an infective individual transmits infection down a stub that stub is paired with a stub belonging to a susceptible individual. It follows that the early stages of such an epidemic can be approximated by a branching process in which each newly-infected individual has their “full” effective degree (i.e. their actual degree minus one for the stub that is paired with their infector). This approximation can be made fully rigorous by considering a sequence of epidemics, indexed by N, and using a coupling argument; see e.g. Ball and Neal (2008), which treats a more general model in which infective individuals also make contacts with individuals chosen uniformly at random from the population. Let \(E_N\) denote the epidemic on a network of N individuals and let \(\mathscr {B}\) denote the approximating branching process. Then following Ball and Neal (2008) (see “Appendix 1”) if \(\mu _D=\mathbb {E}\left[ D\right] \) is finite then the epidemics \(E_1,E_2\dots \) and the branching process \(\mathscr {B}\) can be constructed on a common probability space so that, with probability one, over any finite time interval [0, t] the process of infectives in \(E_N\) and the branching process \(\mathscr {B}\) coincide for all sufficiently large N. The same result holds for the model with prescribed degree sequences provided that \(p^{(N)}_k \rightarrow p_k\) \((k=0,1,\ldots )\) and \(\mu _D^{(N)}=\sum _{k=0}^\infty p^{(N)}_k k \rightarrow \mu _D\) as \(N \rightarrow \infty \), where \(\sum _{k=0}^{\infty } p_k=1\) and \(\mu _D < \infty \).

As indicated in “Appendix 1”, the branching process \(\mathscr {B}\) is not an almost sure upperbound for the process of infectives in \(E_N\), so unlike in Theorem 3.1 of Ball and Donnelly (1995) which considers homogeneously mixing epidemics, one cannot simply use the dominated convergence theorem to deduce convergence of moments of the number of infectives in the epidemic process to corresponding moments of the branching process as \(N \rightarrow \infty \). If there is a maximum degree \(k_{\mathrm {max}}\) (i.e. \(p_k=0\) for all \(k>k_{\mathrm {max}}\), or \(p_k^{(N)}=0\) for all \(k>k_{\mathrm {max}}\) and all N in the model with prescribed degrees) then, for all N, the process of infectives in \(E_N\) is bounded above by a branching process in which each newly-infected individual has the maximun effective degree \(k_{\mathrm {max}}-1\), so in that case the dominated convergence theorem can be used to prove convergence of moments. In “Appendix 1”, we consider the case when there is no maximum degree and use uniform integrability arguments to determine sufficient conditions for the mean and variance of the number of infectives at any given time \(t \ge 0\), and the covariance of the number of infectives at any given times \(t,s \ge 0\), in the epidemic \(E_N\) to converge to the corresponding mean, variance and covariance of the branching process \(\mathscr {B}\) as \(N \rightarrow \infty \). Specifically, we prove that (i) in the model with prescribed degrees these moments converge if, in addition to the conditions given above, there exists \(\delta >0\) such that \(\mu _{D^{3+\delta }}^{(N)}=\sum _{k=0}^\infty p^{(N)}_k k^{3+\delta } \rightarrow \mu _{D^{3+\delta }}=\sum _{k=0}^\infty p_k k^{3+\delta } \) as \(N \rightarrow \infty \), where \(\mu _{D^{3+\delta }} < \infty \); and (ii) in the model with random degrees they converge if the moment-generating function \(M_{D^2}(\theta )=\mathbb {E}\left[ \exp \left( \theta D^2\right) \right] \) of \(D^2\) is finite for some \(\theta _0>0\). Note that the latter condition implies that \(\mathbb {E}\left[ D^\alpha \right] < \infty \) for all \(\alpha \ge 0\).

In this context, we note that, for the model with prescribed degrees, the weakest conditions obtained on the moments of the degree distribution for convergence of the scaled stochastic epidemic on to its deterministic limit are given by Janson et al. (2014), who require uniform boundedness of the second moment of \( \mathbf {D} ^{(N)}\). However that paper, and the other related papers cited in the second paragraph of Sect. 1.1, (i) are concerned with the entire time course of the epidemic; (ii) assume that either the epidemic starts with a positive fraction of the population infected in the limit as \(N \rightarrow \infty \), or if that limiting fraction is zero then the convergence is for epidemics which take off and involves a random time translation describing when the epidemic becomes suitably established; and (iii) consider the evolution of the proportion of the population that is susceptible, infective or recovered. By contrast, this paper is concerned with epidemics initiated by few infectives and considers the number, rather than proportion, of infectives during the early phase of such an epidemic. Under the coupling mentioned above, in the limit as \(N \rightarrow \infty \), if an epidemic takes off then the duration of its early (exponentially growing) phase tends almost surely to infinity.

The limiting branching process may be described by a continuous-time multitype Markov branching process, with the type of an infective corresponding to its effective degree. Let \({\tilde{D}}\) denote the (size-biased) degree of a typical neighbour of a typical individual in the network and let \({\tilde{p}}_k=\mathbb {P}({\tilde{D}}=k)\) \((k=1,2,\ldots )\). Then \({\tilde{p}}_k=\mu _D^{-1} k p_k\), where \(\mu _D=\mathbb {E}\left[ D\right] \), since when a stub is paired it is k times as likely to be paired with a stub from a given individual having degree k than it is with a stub from a given individual having degree 1. Under the branching process approximation, the effective degree of a newly infected individual is distributed according to \({\tilde{D}}-1\), since one of that individual’s stubs is ‘used up’ when it is infected. Note for future reference that \(\mu _{{\tilde{D}}}=\mu _D^{-1}\mu _{D^2}\) and, more generally, \(\mu _{f({\tilde{D}})}=\mu _D^{-1} \mu _{Df(D)}\) for any real-valued function f. (For a random variable, X say, we use \(\mu _X\) to denote its expectation \(\mathbb {E}\left[ X\right] \). Thus, for example, \(\mu _{Df(D)}=\mathbb {E}\left[ Df(D)\right] \).)

2.4 Explicit form for the multitype branching process

We now assume that there is a maximum degree \(k_{\mathrm {max}}\). We show in Sect. 6 that our results for moments of the branching process extend to the case of no maximum degree size, subject to suitable moment conditions on D. Thus the type space for the branching process is \(\mathscr {K}=\{0,1,\ldots ,k_{\mathrm {max}}\}\). Note that only initial infectives can have type \(k_{\mathrm {max}}\). For \(k\in \mathscr {K}\), an individual of type k dies if either its infectious period comes to an end or it transmits infection down one of its unattached stubs, whichever happens first. If the former happens first then the individual has no offspring, otherwise it has two offspring, namely an individual of type \(k-1\) and an individual whose type is distributed according to \({\tilde{D}}-1\). Note that an individual of type 0 necessarily has no offspring when it dies. Thus, for \(k\in \mathscr {K}\), the lifetime of an individual of type k has an exponential distribution with rate

$$\begin{aligned} \omega _k = \gamma + \tau k, \end{aligned}$$
(2.1)

and when it dies its offspring is distributed as follows (recall that \({\tilde{p}}_{k_{\mathrm {max}}+1}=0\)):

$$\begin{aligned} \mathbb {P}\left( \mathrm {Offspring} = \varnothing | \mathrm {Parent\ type} = k \right)&= \frac{\gamma }{\gamma + \tau k}, \nonumber \\ \mathbb {P}\left( \mathrm {Offspring} = \{k-1,l\} | \mathrm {Parent\ type} = k \right)&= \frac{\tau k {\tilde{p}}_{l+1}}{\gamma + \tau k} \quad (l=0,1,\ldots ,k_{\mathrm {max}}-1) . \end{aligned}$$
(2.2)

The joint probability-generating function (PGF) for offspring of a type-k individual is therefore

$$\begin{aligned} P_k( \mathbf {s} ) = \frac{1}{\omega _k} \left( \gamma + \tau k s_{k-1} \sum _{l=0}^{k_{\mathrm {max}}-1} {\tilde{p}}_{l+1} s_l \right) , \end{aligned}$$
(2.3)

where \( \mathbf {s} =(s_k)\). In general we will write \( \mathbf {v} = (v_k) = (v_0,v_1,\ldots ,v_{k_{\mathrm {max}}})^{\!\top }\) for a column vector in \(\mathbb {R}^{k_{\mathrm {max}}+1}\), where \(^\top \) denotes transpose. Verbally, we will call \(v_0\) the 0th element of such a vector, \(v_1\) the first element etc. For \(k\in \mathscr {K}\), let \( \varvec{\partial } P_k( \mathbf {s} )\) be the column vector whose ith element is \(\frac{\partial P_k( \mathbf {s} )}{\partial s_i}\) and let \( \varvec{\partial } ^2 P_{k}( \mathbf {s} )\) be the matrix whose (ij)th element is \(\frac{{\partial }^2 P_{k}( \mathbf {s} )}{\partial s_i \partial s_j}\). We note for future reference that, for \(i,j,k\in \mathscr {K}\),

$$\begin{aligned} \left( \varvec{\partial } P_k(\mathbf {1})\right) _i&=\frac{\tau k}{\gamma +\tau k}\left( {\tilde{p}}_{i+1}+\delta _{k-1,i}\right) , \nonumber \\ \left[ \varvec{\partial } ^2 P_{k}(\mathbf {1})\right] _{i,j}&=\frac{\tau k}{\gamma +\tau k}\left( {\tilde{p}}_{i+1}\delta _{k-1,j}+{\tilde{p}}_{j+1}\delta _{k-1,i}\right) , \end{aligned}$$
(2.4)

where \(\mathbf {1}\) is the length-\((k_{\mathrm {max}}+1)\) column vector of ones.

For \(t \ge 0\), let \(\mathbf {Z}(t)=\left( Z_i(t)\right) \), where \(Z_i(t)\) denotes the number of individuals of type i alive at time t, and let \(Z(t)=Z_0(t)+Z_1(t)+\cdots +Z_{k_{\mathrm {max}}}(t)= \mathbf {1}^{\!\top }\mathbf {Z}(t) \) denote the total number individuals alive at time t. For \(k\in \mathscr {K}\), we use the notation \(\{\mathbf {Z}^{(k)}(t):t \ge 0\}\), where \(\mathbf {Z}^{(k)}(t)=(Z^{(k)}_i(t))\), to denote a process starting with a single individual, whose type is k, at time 0 (i.e. where \(Z_i^{(k)}(0) = \delta _{i,k}\), \(i \in \mathscr {K}\)). Further, \(Z^{(k)}(t)=\mathbf {1}^{\!\top }{\mathbf {Z}^{(k)}(t)}\) denotes the total number of individuals at time t in such a process.

3 Behaviour of means

In the next three sections we consider the behaviour of the mean, variance and covariance function of the total number of individuals over time in the branching process which approximates the initial phase of an epidemic. For \(t \ge 0\) and \(i,j,k\in \mathscr {K}\), let

$$\begin{aligned} \varvec{M} (t)&=[m_{i,j}(t)], \text {where}\quad m_{i,j}(t)=\mathbb {E}\left[ Z^{(i)}_j(t)\right] , \\ \mathbf {m} ^{(k)}(t)&=\mathbb {E}\left[ \mathbf {Z}^{(k)}(t)\right] = \varvec{M} (t)^{\!\top }{ \mathbf {u} _k} , \\ m^{(k)}(t)&=\mathbb {E}\left[ Z^{(k)}(t)\right] =\mathbf {1}^{\!\top } \mathbf {m} ^{(k)}(t) = \mathbf {u} _k^{\!\top } \varvec{M} (t) \mathbf {1} , \end{aligned}$$

where \( \mathbf {u} _k\) is a length-\((k_{\mathrm {max}}+1)\) column vector with kth element equal to 1 and other elements equal to 0. A standard argument using the Kolmogorov forward equation (see e.g. Dorman et al. 2004, Sect. 7 and recall that \({\tilde{p}}_{k_{\mathrm {max}}+1}=0\)) then yields that

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \varvec{M} (t)= \varvec{M} (t) \varvec{{\varOmega }} ,\qquad \varvec{M} (0)= \varvec{I} , \end{aligned}$$
(3.1)

where \( \varvec{I} \) denotes the \((k_{\mathrm {max}}+1) \times (k_{\mathrm {max}}+1)\) identity matrix and \( \varvec{{\varOmega }} =[{\varOmega }_{l,k}]\) is the \((k_{\mathrm {max}}+1) \times (k_{\mathrm {max}}+1)\) matrix with elements given by

$$\begin{aligned} {\varOmega }_{l,k} = \tau l \left( {\tilde{p}}_{k+1} + \delta _{l,k+1}\right) -(\gamma + \tau l) \delta _{l,k} \qquad (l,k \in \mathscr {K}) . \end{aligned}$$

The solution to (3.1) is then straightforwardly given by

$$\begin{aligned} \varvec{M} (t)=\mathrm{e}^{ \varvec{{\varOmega }} t} =\sum _{l=0}^\infty \frac{t^l \varvec{{\varOmega }} ^l}{l!} . \end{aligned}$$
(3.2)

We show in “Appendix 2” that the eigenvalues of \( \varvec{{\varOmega }} \) are

$$\begin{aligned} \lambda _i = {\left\{ \begin{array}{ll} - \gamma - i \tau &{} \text { for } i \in \{0,2,3,\ldots ,k_{\mathrm {max}}\} ,\\ \tau \left( \left( {\textstyle \sum _{l=0}^{k_{\mathrm {max}}}} l {\tilde{p}}_{l+1}\right) -1\right) - \gamma &{} \text { for } i = 1 . \end{array}\right. } \end{aligned}$$
(3.3)

We denote the dominant eigenvalue, \(\lambda _1\), by r, so

$$\begin{aligned} r= \tau \left( \left( {\textstyle \sum _{l=0}^{k_{\mathrm {max}}}} l {\tilde{p}}_{l+1}\right) -1\right) - \gamma = \tau \mu _{{\tilde{D}}-2} - \gamma . \end{aligned}$$
(3.4)

If \(r \le 0\), the branching process \(\{\mathbf {Z}(t):t \ge 0\}\) goes extinct almost surely. If \(r>0\), then r gives the asymptotic exponential growth rate of \(\{Z(t):t \ge 0\}\) (and also of \(\{Z_i(t):t \ge 0\}\) for \(i \in \mathscr {K} \setminus \{k_{\mathrm {max}}\}\)) when \(\{\mathbf {Z}(t):t \ge 0\}\) does not go extinct.

For \(i\in \mathscr {K}\), let \( \mathbf {w} _i^{\!\top } = (w_{i,k})\) be a left eigenvector of \( \varvec{{\varOmega }} \) corresponding to the eigenvalue \(\lambda _i\), so

$$\begin{aligned} \mathbf {w} _i^{\!\top } \varvec{{\varOmega }} = \lambda _i \mathbf {w} _i^{\!\top } . \end{aligned}$$
(3.5)

The Perron–Frobenius theory implies that \( \mathbf {w} _1\) can be chosen so that all of its elements are positive and \( \mathbf {w} _1^{\!\top } \mathbf {1}=1\). The left-eigenvector \( \mathbf {w} _1\) then yields a probability distribution which gives the asymptotic relative frequencies of the different types, as \(t \rightarrow \infty \), when \(\{\mathbf {Z}(t):t \ge 0\}\) does not go extinct.

Expanding (3.5) in components yields

$$\begin{aligned} \sum _{l=0}^{k_{\mathrm {max}}} w_{1,l} \left( \tau l \left( {\tilde{p}}_{k+1} + \delta _{l,k+1}\right) -(\gamma + \tau l) \delta _{l,k} \right) = r w_{1,k} \quad (k\in \mathscr {K}) . \end{aligned}$$
(3.6)

Let \(w(s)=\sum _{l=0}^{k_{\mathrm {max}}} s^l w_{1,l}\) \((s \ge 0)\) denote the (probability-)generating function of \( \mathbf {w} _1\). Multiplying (3.6) by \(s^k\) and summing over k yields

$$\begin{aligned} \tau f_{{\tilde{D}}-1}(s)\mu _W+\tau (1-s) w'(s)=(r+\gamma )w(s), \end{aligned}$$
(3.7)

where \(f_{{\tilde{D}}-1}(s)=\sum _{k=1}^{k_{\mathrm {max}}} {\tilde{p}}_k s^{k-1}\) is the PGF of \({\tilde{D}}-1\) and \(\mu _W=\sum _{k=0}^{k_{\mathrm {max}}} k w_{1,k}\) is the mean of the distribution \( \mathbf {w} _1\). Setting \(s=1\) in (3.7) and using (3.4) yields

$$\begin{aligned} \mu _W=\frac{r+\gamma }{\tau }=\mu _{{\tilde{D}}-2}. \end{aligned}$$
(3.8)

Note that (3.8) has a simple intuitive explanation. For large t, a typical individual gives birth at rate \(\sum _{l=0}^{k_{\mathrm {max}}}w_{1,l} l \tau =\tau \mu _W\) and dies completely (i.e. without producing any offspring) at rate \(\gamma \), so the population growth rate \(r=\tau \mu _W-\gamma \) and (3.8) follows using (3.4) .

For \(i,k \in \mathbb {Z}^+\), let \(k_{[i]}=k(k-1)\ldots (k-i+1)\) denote a falling factorial, with the convention \(k_{[0]}=1\). For \(i=0,1,\ldots \), let \(\mu _W^{[i]}=\sum _{k=0}^{k_{\mathrm {max}}} k_{[i]} w_{1,k}\) be the ith factorial-moment of the distribution \( \mathbf {w} _1\), so \(\mu _W^{[0]}=1\) and \(\mu _W^{[1]}=\mu _W\). Note that \(\mu _W^{[i]}=w^{(i)}(1)\) \((i=0,1,\ldots )\), where \(w^{(i)}(s)\) denotes the ith derivative of w(s). Repeated differentiation of (3.7) yields

$$\begin{aligned} \mu _W^{[i]}=\frac{\mu _{{\tilde{D}}-2}}{\mu _{{\tilde{D}}-2+i}}\mu _{{\tilde{D}}-1}^{[i]}, \end{aligned}$$
(3.9)

where \(\mu _{{\tilde{D}}-1}^{[i]}\) is the ith factorial-moment of \({\tilde{D}}-1\). Note that \(\mu _{{\tilde{D}}-1}^{[i]}=0\) for \(i \ge k_{\mathrm {max}}\). It then follows, using the inversion formula which expresses the probability mass function of a non-negative integer-valued random variable in terms of its factorial-moments (see e.g. Daley and Vere-Jones 1988, p. 117), that

$$\begin{aligned} w_{1,k}={\left\{ \begin{array}{ll} \sum _{i=k}^{k_{\mathrm {max}}-1} (-1)^{i-k} {i \atopwithdelims ()k}\frac{\mu _{{\tilde{D}}-2} \mu _{{\tilde{D}}-1}^{[i]}}{i! \mu _{{\tilde{D}}-2+i}} &{} \text { if } k=0,1,\ldots ,k_{\mathrm {max}}-1 ,\\ 0 &{} \text { if } k = k_{\mathrm {max}}. \end{array}\right. } \end{aligned}$$
(3.10)

Observe that \(w_{1,k_{\mathrm {max}}}=0\) since only initial infectives can have type \(k_{\mathrm {max}}\). Observe also that \( \mathbf {w} _1\) is determined just by the degree distribution of the network and is invariant to the epidemic parameters \(\gamma \) and \(\tau \).

For \(t\ge 0\), let \( \mathbf {m} (t)=(m^{(i)}(t))=( \varvec{M} (t)\mathbf {1})^{\!\top }\). Thus the kth element of \( \mathbf {m} (t)\) contains the mean total population size at time t given that the process starts with a single individual whose type is k. We derive a simple expression for \( \mathbf {m} (t)\). The following proposition is useful.

Proposition 1

For a matrix \( \varvec{M} \) and vectors \( \mathbf {x} \), \( \mathbf {y} \) such that \( \varvec{M} \mathbf {x} = a \mathbf {x} + b \mathbf {y} \) and \( \varvec{M} \mathbf {y} = c \mathbf {y} \), where ab and c are scalars satisfying \(a \ne c\),

$$\begin{aligned} \mathrm{e}^{ \varvec{M} t} \mathbf {x} = \mathrm{e}^{at} \mathbf {x} + \frac{b}{a-c} \left( \mathrm{e}^{at} - \mathrm{e}^{ct}\right) \mathbf {y} \qquad \hbox {and}\qquad \mathrm{e}^{ \varvec{M} t} \mathbf {y} = \mathrm{e}^{c t} \mathbf {y} . \end{aligned}$$
(3.11)

Proof

The second identity follows straightforwardly from the definition of the matrix exponential and the fact that \( \mathbf {y} \) is a right eigenvector with eigenvalue c. For the first identity,

$$\begin{aligned} \mathrm{e}^{ \varvec{M} t} \mathbf {x}&= \sum _{i=0}^{\infty } \frac{1}{i!}( \varvec{M} t)^i \mathbf {x} \nonumber \\&= \sum _{i=0}^{\infty } \frac{t^i}{i!} \left( a^i \mathbf {x} + \sum _{j=0}^{i-1} b c^{i-1-j}a^j \mathbf {y} \right) \nonumber \\&= \mathrm{e}^{at} \mathbf {x} + \sum _{i=0}^{\infty } \frac{t^i}{i!} b c^{i-1} \frac{\left( \frac{a}{c}\right) ^i -1}{\left( \frac{a}{c}\right) -1} \mathbf {y} \nonumber \\&= \mathrm{e}^{at} \mathbf {x} + \frac{b}{a-c} \left( \mathrm{e}^{at} - \mathrm{e}^{ct}\right) \mathbf {y} . \end{aligned}$$
(3.12)

\(\square \)

Let \( \mathbf {n} =(0,1,\ldots ,k_{\mathrm {max}})^{\!\top }\). Observe that

$$\begin{aligned} \varvec{{\varOmega }} \mathbf {1}= \tau \mathbf {n} - \gamma \mathbf {1}\qquad \text{ and }\qquad \varvec{{\varOmega }} \mathbf {n} = r \mathbf {n} , \end{aligned}$$
(3.13)

so using Proposition 1 with \( \varvec{M} = \varvec{{\varOmega }} , \mathbf {x} =\mathbf {1}, \mathbf {y} = \mathbf {n} , a=-\gamma , b=\tau \) and \(c=r\), and recalling from (3.4) that \(r+\gamma =\mu _{{\tilde{D}}-2}\tau \), we have

$$\begin{aligned} \mathrm{e}^{ \varvec{{\varOmega }} x} \mathbf {1}=\mu _{{\tilde{D}}-2}^{-1} \left( \mathrm{e}^{rx} - \mathrm{e}^{-\gamma x} \right) \mathbf {n} + \mathrm{e}^{-\gamma x} \mathbf {1}\qquad \text{ and } \qquad \mathrm{e}^{ \varvec{{\varOmega }} x} \mathbf {n} = \mathrm{e}^{r x} \mathbf {n} . \end{aligned}$$
(3.14)

Thus,

$$\begin{aligned} \mathbf {m} (t)=\mu _{{\tilde{D}}-2}^{-1}\left( \mathrm{e}^{rt} - \mathrm{e}^{-\gamma t} \right) \mathbf {n} + \mathrm{e}^{-\gamma t}\mathbf {1}\end{aligned}$$
(3.15)

and

$$\begin{aligned} \lim _{t \rightarrow \infty }\mathrm{e}^{-r t} \mathbf {m} (t)=\mu _{{\tilde{D}}-2}^{-1} \mathbf {n} . \end{aligned}$$
(3.16)

While it is well known that asymptotically the mean prevalence grows exponentially with rate constant r, i.e. that \(\mathrm {prevalence} \propto \mathrm{e}^{rt}\), these results allow us to see from (3.15) that the rate of convergence to this asymptotic behaviour is \(r+\gamma \), and from (3.16) that the constant of proportionality is the degree of the initially infected individual divided by \(\mu _{{\tilde{D}}-2}\).

We also consider the relationship between the equations above and the diverse ODE approaches to the mean behaviour of the full network epidemic model. Miller and Kiss (2014) consider several such approaches; their notation can be related to ours by defining

$$\begin{aligned} I(t) = \mathbf {m} ^{(k)}(t)^{\!\top } \mathbf {1} , \qquad \lambda (t) = \mathbf {m} ^{(k)}(t)^{\!\top } \mathbf {n} . \end{aligned}$$
(3.17)

Substituting (3.17) and (3.13) into (3.1) gives

$$\begin{aligned} \frac{\mathrm {d}I}{\mathrm {d}t} = \tau \lambda - \gamma I , \qquad \frac{\mathrm {d}\lambda }{\mathrm {d}t} = r \lambda . \end{aligned}$$
(3.18)

This pair of equations can be derived from various models considered by Miller and Kiss (2014, c.f. Sect. 3.4.1) assuming an initially small infectious population and negligible susceptible depletion. Therefore, our results suggest that the ODE approaches to mean behaviour do not require correction as the infectious population becomes extremely small, and the typical assumption that \(1 \ll I(t=0) \ll N\) for the ODE system to hold may be too conservative, with \(N \gg 1\) being all that is required.

4 Variance

The variance in infectious prevalence during the exponentially growing phase of an epidemic was considered in Graham and House (2014), but using the diffusion limit and an argument about the neighborhood around an infective. A branching process limit lets us be more explicit. For \(k\in \mathscr {K}\), let \( \mathbf {u} _k\) denote the length-\((k_{\mathrm {max}}+1)\) column vector whose element corresponding to type k is 1 and all other elements are 0, so \({ \mathbf {u} _k}=(\delta _{i,k})\). For \(t \ge 0\) and \(k \in \mathscr {K}\), let \(\sigma _{ij}^{(k)}(t)=\mathrm{cov}\left( Z^{(k)}_i(t),Z^{(k)}_j(t)\right) \) \((i,j \in \mathscr {K})\).

A matrix integrating factor argument gives

$$\begin{aligned} \varvec{V} ^{(k)}(t)=\left[ \sigma _{ij}^{(k)}(t)\right]&= \mathbb {E}\left[ \mathbf {Z} ^{(k)}(t) \mathbf {Z} ^{(k)}(t)^{\!\top }\right] -\mathbb {E}\left[ \mathbf {Z} ^{(k)}(t)\right] \mathbb {E}\left[ \mathbf {Z} ^{(k)}(t)^{\!\top }\right] \nonumber \\&= \int _{0}^{t} \mathrm{e}^{ \varvec{{\varOmega }} ^{\!\top }(t-u)} \varvec{B} _k(u) \mathrm{e}^{{ \varvec{{\varOmega }} }(t-u)} \mathrm {d}u , \end{aligned}$$
(4.1)

where

$$\begin{aligned} \varvec{B} _k(t)&= \sum _{l=0}^{k_{\mathrm {max}}} \left( \mathrm{e}^{ \varvec{{\varOmega }} t}\right) _{k,l} \varvec{C} _l ,\nonumber \\ \varvec{C} _k&= \omega _k \left( \varvec{\partial } ^2 P_{k}(\mathbf {1}) +\mathrm {diag}( \mathbf {f} _k) - \mathbf {u} _k \mathbf {f} _k^{\!\top }- \mathbf {f} _k \mathbf {u} _k^{\!\top }+ \mathbf {u} _k \mathbf {u} _k^{\!\top } \right) , \nonumber \\ \mathbf {f} _k&= \varvec{\partial } P_k(\mathbf {1}) . \end{aligned}$$
(4.2)

See Dorman et al. (2004, Sect. 9), and also Athreya and Ney (1972, p. 203), for details.Footnote 1

For \(t \ge 0\) and \(k\in \mathscr {K}\), let \(v^{(k)}(t)\) denote the variance of the total population size at time t given that the process starts with a single individual, whose type is k. Then \(v^{(k)}(t)= \mathbf {1}^{\!\top } \varvec{V} ^{(k)}(t) \mathbf {1}\) and it follows using (4.1) that

$$\begin{aligned} v^{(k)}(t)&= \int _{0}^{t} (\mathrm{e}^{ \varvec{{\varOmega }} (t-u)} \mathbf {1})^{\!\top } \varvec{B} _k(u) (\mathrm{e}^{{ \varvec{{\varOmega }} }(t-u)} \mathbf {1}) \mathrm {d}u\nonumber \\&= \int _{0}^{t} \left( \mu _{{\tilde{D}}-2}^{-1} \left( \mathrm{e}^{r(t-u)} - \mathrm{e}^{-\gamma (t-u)} \right) \right) ^2 \mathbf {n} ^{\!\top } \varvec{B} _k(u) \mathbf {n} \; \mathrm {d}u \nonumber \\&\quad + 2 \int _{0}^{t} \mu _{{\tilde{D}}-2}^{-1} \left( \mathrm{e}^{r(t-u)} - \mathrm{e}^{-\gamma (t-u)} \right) \mathrm{e}^{-\gamma (t-u)} \mathbf {1} ^{\!\top } \varvec{B} _k(u) \mathbf {n} \; \mathrm {d}u\nonumber \\&\quad + \int _{0}^{t} \mathrm{e}^{-2 \gamma (t-u)} \mathbf {1} ^{\!\top } \varvec{B} _k(u) \mathbf {1} \; \mathrm {d}u , \end{aligned}$$
(4.3)

where we have used the first equation in (3.14) in deriving the last line. This quantity has an exact but rather complex closed-form solution, which we give below and derive in “Appendix 3”.

Let \( \mathbf {n} _2=(0^2,1^2,\ldots ,{k_{\mathrm {max}}}^2)^{\!\top }\) and, for \(t\ge 0\), let \( \mathbf {v} (t)=(v^{(i)}(t))\). Then

$$\begin{aligned} \mathbf {v} (t)=\alpha _0(t) \mathbf {1}+ \alpha _1(t) \mathbf {n} + \alpha _2(t) \mathbf {n} _2, \end{aligned}$$
(4.4)

where

$$\begin{aligned} \alpha _0(t)&= \gamma I_2(t),\nonumber \\ \alpha _1(t)&= \gamma \mu _{{\tilde{D}}-2}^{-1}\left[ I_1(t)-I_2(t)+2I_3(t)+\mu _{{\tilde{D}}-2}^{-1}\mu _{{\tilde{D}}}^{-1}\mu _{({\tilde{D}}-1)^2+1}(I_4(t)-I_5(t))\right] \nonumber \\&\quad +\tau \left[ I_1(t)+2I_3(t)+\mu _{{\tilde{D}}-2}^{-2}\mu _{({\tilde{D}}-2)^2}I_4(t)\right] ,\nonumber \\ \alpha _2(t)&= \gamma \mu _{{\tilde{D}}-2}^{-2} I_5(t), \end{aligned}$$
(4.5)

with

$$\begin{aligned} I_1(t)&=\frac{\mathrm{e}^{rt}-\mathrm{e}^{-2\gamma t}}{r+2\gamma },\nonumber \\ I_2(t)&=\frac{\mathrm{e}^{-\gamma t}\left( 1-\mathrm{e}^{-\gamma t}\right) }{\gamma },\nonumber \\ I_3(t)&=\frac{\mathrm{e}^{rt}\left( 1-\mathrm{e}^{-\gamma t}\right) }{\gamma }-I_1(t),\nonumber \\ I_4(t)&=\frac{\mathrm{e}^{rt}\left( \mathrm{e}^{rt}-1\right) }{r}-2I_3(t)-I_1(t) ,\nonumber \\ I_5(t)&=\frac{\mathrm{e}^{2rt}-\mathrm{e}^{-(\gamma +2\tau )t}}{2r+2\tau +\gamma } -2\frac{\mathrm{e}^{-\gamma t}\left( \mathrm{e}^{rt}-\mathrm{e}^{-2\tau t}\right) }{r+2\tau }+\frac{\mathrm{e}^{-\gamma t}\left( \mathrm{e}^{-2\tau t}-\mathrm{e}^{-\gamma t}\right) }{\gamma -2\tau }. \end{aligned}$$
(4.6)

In (4.6), if a denominator is zero then the expression is given by the limit as that denominator tends to zero. For example, if \(\gamma =0\) then \(I_2(t)=t\mathrm{e}^{-\gamma t}\), and if \(\gamma =2\tau \), the final term in \(I_5(t)\) is replaced by \(t\mathrm{e}^{-2 \gamma t}\).

Equation (4.4) leads to a rather complex expression for \( \mathbf {v} (t)\) in terms of elementary functions. However, its asymptotic form as \(t \rightarrow \infty \) is much simpler. Note that \(\lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} I_k(t)=0\), for \(k=1,2,3\), \(\lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} I_4(t)=\frac{1}{r}\) and \(\lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} I_5(t)=\frac{1}{2r+2\tau +\gamma }=\frac{1}{2\mu _{{\tilde{D}}-1}\tau -\gamma }\). Substituting these limits into (4.4) and (4.5) yields

$$\begin{aligned} \lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} \mathbf {v} (t)= \frac{1}{\mu _{{\tilde{D}}-2}^2\left( 2\mu _{{\tilde{D}}-1}\tau -\gamma \right) } \left[ \frac{2\tau \mu _{{\tilde{D}}-1}\left( \mu _{({\tilde{D}}-2)^2} \tau +\gamma \right) }{r} \mathbf {n} +\gamma \mathbf {n} _2\right] .\nonumber \\ \end{aligned}$$
(4.7)

Note that both the asymptotic and exact expressions for \( \mathbf {v} (t)\) depend on the degree distribution D only through its first three moments.

It follows from (3.16) and (4.7) that, for \(k\in \mathscr {K}\),

$$\begin{aligned} \lim _{t \rightarrow \infty }\frac{\mathrm{var}\left( Z^{(k)}(t)\right) }{\mathbb {E}\left[ Z^{(k)}(t)\right] ^2}= \frac{1}{2\mu _{{\tilde{D}}-1}\tau -\gamma } \left[ \gamma +\frac{2\tau \mu _{{\tilde{D}}-1}\left( \mu _{({\tilde{D}}-2)^2} \tau +\gamma \right) }{kr}\right] . \end{aligned}$$
(4.8)

We note two features of these results. First, the Eqs. (4.4) and (4.5) involve many rates that are linear combinations of r, \(\tau \) and \(\gamma \), with the dominant being 2r and the subdominant being r. This leads to complex real-time behaviour as the system approaches its asymptotic limit. In the diffusion limit, only the dominant and subdominant rates are present, leading to the same overall rate of convergence r, but other rates are not present (Graham and House 2014). Secondly, the dependence of the variance on initial conditions is not simple proportionality, meaning that (4.7) contains terms proportional to both \( \mathbf {n} \) and \( \mathbf {n} _2\) (unless \(\gamma =0\)) and the right-hand side of (4.8) depends on k.

5 Covariance function

For \(t,s \ge 0\) and \(k\in \mathscr {K}\), let \(\sigma ^{(k)}(t,s)=\mathrm{cov}\left( Z^{(k)}(t), Z^{(k)}(s)\right) \) denote the covariance of the total population sizes at times t and s in the branching process which approximates the early phase of an epidemic, given that the process starts with a single individual, whose type is k. We assume without loss of generality that \(t \le s\); although this choice does not respect alphabetical order, the majority of results that follow take t as an argument rather than s, and are therefore more easily read as functions of time. Then

$$\begin{aligned} \sigma ^{(k)}(t,s)&=\mathbb {E}\left[ \mathrm{cov}\left( Z^{(k)}(t), Z^{(k)}(s)|\mathbf {Z}^{(k)}(t)\right) \right] \nonumber \\&\quad +\mathrm{cov}\left( \mathbb {E}\left[ Z^{(k)}(t)|\mathbf {Z}^{(k)}(t)\right] ,\mathbb {E}\left[ Z^{(k)}(s)|\mathbf {Z}^{(k)}(t)\right] \right) . \end{aligned}$$
(5.1)

The first term on the right hand side of (5.1) is zero, since \(Z^{(k)}(t)\) is non-random given \(\mathbf {Z}^{(k)}(t)\). Now, \(\mathbb {E}\left[ Z^{(k)}(t)|\mathbf {Z}^{(k)}(t)\right] = \mathbf {1} ^{\!\top }\mathbf {Z}^{(k)}(t)\) and \(\mathbb {E}\left[ Z^{(k)}(s)|\mathbf {Z}^{(k)}(t)\right] =\mathbf {Z}^{(k)}(t)^{\!\top } \varvec{M} (s-t)\mathbf {1}\), so

$$\begin{aligned} \sigma ^{(k)}(t,s)&=\mathrm{cov}\left( \mathbf {1} ^{\!\top }\mathbf {Z}^{(k)}(t),\mathbf {Z}^{(k)}(t)^{\!\top } \varvec{M} (s-t)\mathbf {1}\right) \nonumber \\&= \mathbf {1} ^{\!\top } \varvec{V} ^{(k)}(t) \varvec{M} (s-t)\mathbf {1}\nonumber \\&=\mu _{{\tilde{D}}-2}^{-1}\left( \mathrm{e}^{r(s-t)}-\mathrm{e}^{-\gamma (s-t)}\right) \mathbf {1} ^{\!\top } \varvec{V} ^{(k)}(t) \mathbf {n} +\mathrm{e}^{-\gamma (s-t)}v^{(k)}(t), \end{aligned}$$
(5.2)

using (3.2), the first equation in (3.14) and noting that \( \mathbf {1} ^{\!\top } \varvec{V} ^{(k)}(t) \mathbf {1} =v^{(k)}(t)\). This leads to an exact, closed-form expression for the covariance function in terms of elementary functions, which we state below and derive in “Appendix 4”.

For \(t,s\ge 0\), let \( \mathbf {\sigma } (t,s)=\left( \sigma ^{(0)}(t,s),\sigma ^{(1)}(t,s),\ldots ,\sigma ^{(k_{\mathrm {max}})}(t,s)\right) ^{\!\top }\). Then

$$\begin{aligned} \mathbf {\sigma } (t,s)=\mu _{{\tilde{D}}-2}^{-1}\left( \mathrm{e}^{r(s-t)}-\mathrm{e}^{-\gamma (s-t)}\right) \left( \beta _1(t) \mathbf {n} + \beta _2(t) \mathbf {n} _2\right) +\mathrm{e}^{-\gamma (s-t)} \mathbf {v} (t), \end{aligned}$$
(5.3)

where

$$\begin{aligned} \beta _1(t)&= \gamma \mu _{{\tilde{D}}-2}^{-1}\left[ \mu _{{\tilde{D}}}^{-1}\mu _{({\tilde{D}}-1)^2+1}(I_7(t)-I_8(t))+\mu _{{\tilde{D}}-2}I_6(t)\right] \nonumber \\&\quad +\tau \mu _{{\tilde{D}}-2}^{-1}\left[ \mu _{({\tilde{D}}-2)^2}I_7(t)+ \mu _{{\tilde{D}}-2}^2 I_6(t)\right] ,\nonumber \\ \beta _2(t)&= \gamma \mu _{{\tilde{D}}-2}^{-1} I_8(t), \end{aligned}$$
(5.4)

with

$$\begin{aligned} I_6(t)&=\frac{\mathrm{e}^{rt}\left( 1-\mathrm{e}^{-\gamma t}\right) }{\gamma } ,\nonumber \\ I_7(t)&=\frac{\mathrm{e}^{rt}\left( \mathrm{e}^{rt}-1\right) }{r}-I_6(t) ,\nonumber \\ I_8(t)&=\frac{\mathrm{e}^{2rt}-\mathrm{e}^{-(\gamma +2\tau )t}}{2r+2\tau +\gamma }-\frac{\mathrm{e}^{-\gamma t}\left( \mathrm{e}^{rt}-\mathrm{e}^{-2\tau t}\right) }{r+2\tau }. \end{aligned}$$
(5.5)

As at (4.6), an appropriate limit is taken if a denominator in (5.5) is zero.

The covariance function takes a simple form in the limit as t and \(s \rightarrow \infty \). Note that \(\lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} I_6(t)=0\), \(\lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} I_7(t)=\frac{1}{r}\) and \(\lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} I_8(t)=\frac{1}{2\mu _{{\tilde{D}}-1}\tau -\gamma }\). Substituting these limits into (5.3) yields that, for any \(s\ge 0\),

$$\begin{aligned} \lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} \mathbf {\sigma } (t,t+s)=\mathrm{e}^{rs} \lim _{t \rightarrow \infty } \mathrm{e}^{-2rt} \mathbf {v} (t). \end{aligned}$$
(5.6)

It follows that, for \(k\in \mathscr {K}\) and \(s>0\),

$$\begin{aligned} \lim _{t \rightarrow \infty } \mathrm{corr}\left( Z^{(k)}(t),Z^{(k)}(t+s)\right) =1, \end{aligned}$$

where \(\mathrm{corr}\) denotes correlation. This is not surprising since it is well known that

$$\begin{aligned} \mathrm{e}^{-rt} Z^{(k)}(t) \xrightarrow {\text {a.s.}} W^{(k)} \quad \text {as } t \rightarrow \infty , \end{aligned}$$

where \(\xrightarrow {\text {a.s.}}\) denotes almost sure convergence (i.e. convergence with probability 1) and \(W^{(k)}\) is a non-negative random which satsifies \(W^{(k)}=0\) if and only if the branching process goes extinct; see e.g.  Athreya and Ney (1972, Theorem V.7.2).

6 Unbounded degree distributions

The above results have all assumed that there is a maximum degree \(k_{\mathrm {max}}\). Suppose that is not the case, so the branching process has countably many types. For \(t\ge 0\), let \(\mathbf {Z}(t)=(Z_0(t), Z_1(t),\ldots )^{\!\top }\), where \(Z_i(t)\) denotes the number of individuals of type i alive at time t, and let \(Z(t)=\sum _{i=0}^\infty Z_i(t)\) denote the total number individuals alive at time t. (For ease of notation we drop explict reference to the type of the initial individual.) For \(k_{\mathrm {max}}=1,2,\ldots \), let \(\{\mathbf {Z}(t,k_{\mathrm {max}}):t \ge 0\}\) denote the branching process derived from \(\{\mathbf {Z}(t):t \ge 0\}\) by ignoring all individuals having type strictly greater than \(k_{\mathrm {max}}\) and any offspring of such individuals. For \(t \ge 0\), let \(Z(t, k_{\mathrm {max}})=\mathbf {1}^{\!\top }\mathbf {Z}(t,k_{\mathrm {max}})\) be the total number of individuals alive in \(\{\mathbf {Z}(t,k_{\mathrm {max}}):t \ge 0\}\) at time t. Now, for any \(t \ge 0\), \(Z(t, k_{\mathrm {max}})\) is monotonically increasing in \(k_{\mathrm {max}}\) and converges almost surely to Z(t) as \(k_{\mathrm {max}}\rightarrow \infty \). Thus, by the monotone convergence theorem, \(\mathbb {E}\left[ Z(t)\right] =\lim _{k_{\mathrm {max}}\rightarrow \infty }\mathbb {E}\left[ Z(t,k_{\mathrm {max}})\right] \).

The process \(\{\mathbf {Z}(t,k_{\mathrm {max}}):t \ge 0\}\) behaves like the branching process described in Sect. 2.3 but with infection rate \(\tau \) replaced by \(\tau (k_{\mathrm {max}})=\tau \mathbb {P}\left( {\tilde{D}} \le k_{\mathrm {max}}+1\right) \), and size-biased degree distribution \({\tilde{D}}\) replaced by \({\tilde{D}}(k_{\mathrm {max}})\), where

$$\begin{aligned} \mathbb {P}\left( {\tilde{D}}(k_{\mathrm {max}})=k\right) = {\left\{ \begin{array}{ll} \frac{{\tilde{p}}_k}{\mathbb {P}\left( {\tilde{D}} \le k_{\mathrm {max}}+1\right) } &{} \text { if } k=1,2,\ldots ,k_{\mathrm {max}}+1 ,\\ 0 &{} \text { if } k= k_{\mathrm {max}}+2,k_{\mathrm {max}}+3,\ldots . \end{array}\right. } \end{aligned}$$

The presence of \(k_{\mathrm {max}}+1\) rather than \(k_{\mathrm {max}}\) is because contacts with individuals having degree strictly greater than \(k_{\mathrm {max}}\!+\!1\) are ignored, as they yield individuals with effective degree (and hence type) strictly greater than \(k_{\mathrm {max}}\). Now \(\tau (k_{\mathrm {max}}) \!\rightarrow \! \tau \) and \(\mathbb {E}\left[ {\tilde{D}}(k_{\mathrm {max}})\right] \!\!\rightarrow \!\! \mathbb {E}\left[ {\tilde{D}}\right] \) as \(k_{\mathrm {max}}\!\!\rightarrow \!\! \infty \), so the expression (3.15) for the mean total population size at time t continues to hold in the unbouded degree case, provided that \(\mathbb {E}\left[ {\tilde{D}}\right] \!<\!\infty \), or equivalently that \(\mathbb {E}\left[ D^2\right] \!<\!\infty \). A similar argument shows that the expressions for the variance of Z(t) and the covariance of Z(t) and Z(s), derived in Sects. 4 and 5, respectively, continue to hold provided \(\mathbb {E}\left[ {\tilde{D}}^2\right] \!<\!\infty \), or equivalently \(\mathbb {E}\left[ D^3\right] \!<\!\infty \).

7 Probability of extinction

For \(t \ge 0\) and \(k\in \mathscr {K}\), let \(\pi _k(t)\!=\!\mathbb {P}\left( Z^{(k)}(t)\!=\!0\right) \) be the probability that the branching process is extinct at time t given that it starts with one individual of type k. Then in general

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\pi _k(t) = -\omega _k \pi _k(t) + \omega _k P_k( \varvec{\pi } (t)) , \end{aligned}$$
(7.1)

where \( \varvec{\pi } (t)=(\pi _i(t))\). For our specific model, using (2.1) and (2.3), we have

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\pi _k(t) = -(\gamma + \tau k) \pi _k(t) + \gamma + \tau k \pi _{k-1}(t) \sum _{l=0}^{k_{\mathrm {max}}-1} {\tilde{p}}_{l+1} \pi _l(t) . \end{aligned}$$
(7.2)

These equations are not amenable to closed-form solution. Note, however, that studies of time to extinction for network epidemics—e.g. Holme (2013)—have tended to be based on Monte Carlo methods, but (7.2) could provide a complementary approach that is numerically cheaper and more analytically tractable.

We will now consider three regimes in which asymptotic methods can be used to bound the real-time behaviour of the probabilities of extinction. In particular, we will see that early real-time behaviour is bounded by the death rates \(\omega _k\), while late-time behaviour is bounded by the asymptotic real-time growth rate r provided \(r>-\gamma \).

7.1 Late behaviour of the subcritical case

Suppose that \(r<0\), so the branching process is subcritical. For \(t \ge 0\) and \(k\in \mathbb {N}_0=\{0,1,\ldots \}\), we will work with the probability of survival \(q_k(t)=1-\pi _k(t)=\mathbb {P}\left( Z^{(k)}(t)>0\right) \). Now \(\mathbb {P}\left( Z^{(k)}(t)>0\right) \le \mathbb {E}\left[ Z^{(k)}(t)\right] \), so using (3.15) a simple upper bound for \(q_k(t)\), valid also in the unbounded degree setting using the results in Sect. 6, is

$$\begin{aligned} q_k(t) \le k\mu _{{\tilde{D}}-2}^{-1}\left( \mathrm{e}^{rt} - \mathrm{e}^{-\gamma t} \right) + \mathrm{e}^{-\gamma t} . \end{aligned}$$
(7.3)

Note that \(\mu _{{\tilde{D}}} < \infty \) is a necessary condition for \(r<0\). Under the stronger condition that \(\mu _{{\tilde{D}}^2}< \infty \), Windridge (2015) gives an exponential approximation, for large t, to a quantity closely related to \(q_k(t)\). He assumes that what we call type-0 individuals are dead. For \(k=1,2,\ldots \), let \({\hat{q}}_k(t)=\mathbb {P}\left( \sum _{i=1}^\infty Z_i^{(k)}(t)>0\right) \). Then, Windridge shows that there exists a constant \({\hat{c}} \in (0,1]\) such that, for any \(a < \min \{\tau , -r\}\),

$$\begin{aligned} {\hat{q}}_k(t) ={\hat{c}} k \mathrm{e}^{rt}\left( 1+O(k\mathrm{e}^{-at})\right) \quad \text{ as } t \rightarrow \infty , \end{aligned}$$
(7.4)

for any \(k \ge 1\). The constant \({\hat{c}}=\lim _{t\rightarrow \infty }\mathrm{e}^{-r t} {\hat{q}}_1(t)\). Note that for some practical purposes, \({\hat{q}}_k(t)\) may be of more interest than \(q_k(t)\), since type-0 individuals are unable to transmit infection. In particular, in “Appendix 5” we sketch the argument that for the case where \(r>-\gamma \) an analogous result to (7.4) holds, i.e. , for \(k \ge 1\),

$$\begin{aligned} {q}_k(t)\sim c k \mathrm{e}^{rt} \quad \text {as}\, t \rightarrow \infty ,\quad \text {where} \, {c}=\lim _{t\rightarrow \infty }\mathrm{e}^{-r t} q_1(t)>0 . \end{aligned}$$
(7.5)

(For real-valued functions, f and g say, \(f(t) \sim g(t)\) as \(t \rightarrow \infty \) if \(\lim _{t \rightarrow \infty } f(t)/g(t)=1\).)

For the case where \(r<-\gamma \) (so, from (3.4), \(\mu _{{\tilde{D}}}<2\)) we show in “Appendix 5” that if \(\mu _{{\tilde{D}}^2}< \infty \) then, for \(k \ge 0\),

$$\begin{aligned} {q}_k(t)\sim (1-k\mu _{{\tilde{D}}-2}^{-1})\mathrm{e}^{- \gamma t} \quad \text {as}\, t \rightarrow \infty . \end{aligned}$$
(7.6)

Note that in this case the asymptotic behaviour of the survival probability \({q}_k(t)\) is independent of the infection rate \(\tau \). The case \(r<-\gamma \) could occur, for example, at the end of an epidemic where \(\tau \gg \gamma \). Such an epidemic would consist primarily of transmission events at early times, with the late behaviour dominated by recovery events.

7.2 Late behaviour of the supercritical case

An approximation to \(q_k(t)\) in the supercritical case (\(r>0\)) can be obtained by exploiting the fact that a supercritical branching process conditioned on extinction is probabilistically equivalent to a subcritical branching process. For \(k\in \mathbb {N}_0\), let \(T^{(k)}=\inf \{t \ge 0:Z^{(k)}(t)=0\}\) denote the extinction time of the branching process given that it starts with one individual of type k, where \(T^{(k)}=\infty \) if the branching process survives forever, and let \(\pi _k=\mathbb {P}\left( T^{(k)}<\infty \right) =\pi _k(\infty )\) be the probability that the branching process ultimately goes extinct. Then,

$$\begin{aligned} q_k(t)=1-\pi _k+\pi _k \mathbb {P}\left( T^{(k)}>t|T^{(k)}<\infty \right) . \end{aligned}$$
(7.7)

Let \(\{\tilde{ \mathbf {Z} }^{(k)}(t):t \ge 0 \}\) be distributed as \(\{ \mathbf {Z} ^{(k)}(t):t \ge 0 |T^{(k)}<\infty \}\). Then it follows from  Waugh (1958, Sect. 5), that \(\{\tilde{ \mathbf {Z} }^{(k)}(t):t \ge 0 \}\) is also a continuous-time multitype Markov branching process, in which the lifetime of a typical type-k individual has an exponential distribution with rate \(\gamma +\tau k\), as at (2.1), but when it dies its offspring is now distributed as follows:

$$\begin{aligned} \mathbb {P}\left( \mathrm {Offspring} = \varnothing | \mathrm {Parent\ type} = k \right)&= \frac{1}{\pi _k}\frac{\gamma }{\gamma + \tau k} , \nonumber \\ \mathbb {P}\left( \mathrm {Offspring} = \{k-1,l\} | \mathrm {Parent\ type} = k \right)&= \frac{\pi _{k-1} \pi _l}{\pi _k}\frac{\tau k {\tilde{p}}_{l+1}}{\gamma + \tau k} \quad (l\in \mathbb {N}_0) . \end{aligned}$$
(7.8)

Suppose now that there is a maximum degree size \(k_{\mathrm {max}}\). Let \(\tilde{ \varvec{{\varOmega }} }=[\tilde{{\varOmega }}_{l,k}]\) be the \((k_{\mathrm {max}}+1) \times (k_{\mathrm {max}}+1)\) matrix with elements given by

$$\begin{aligned} \tilde{{\varOmega }}_{l,k} = \frac{\pi _{l-1}\pi _k}{\pi _l}\tau l \left( {\tilde{p}}_{k+1} + \delta _{l,k+1}\right) -(\gamma + \tau l) \delta _{l,k} \qquad (l,k\in \mathscr {K}). \end{aligned}$$

Then, recalling (3.2),

$$\begin{aligned} \mathbb {E}\left[ {\tilde{Z}}^{(k)}(t)\right] = \mathbf {u} _k^{\!\top }\mathrm{e}^{\tilde{ \varvec{{\varOmega }} }t} \mathbf {1}, \end{aligned}$$
(7.9)

where \({\tilde{Z}}^{(k)}(t)={\tilde{Z}}^{(k)}_0(t)+{\tilde{Z}}^{(k)}_1(t)+\ldots +{\tilde{Z}}^{(k)}_{k_{\mathrm {max}}}(t)\).

Let \(\tilde{r}\) denote the dominant eigenvalue of \(\tilde{ \varvec{{\varOmega }} }\) and note that \(\tilde{r}<0\). For \(t \ge 0\) and \(k=0,1,\ldots , k_{\mathrm {max}}\), let \({\tilde{q}}_k(t)=\mathbb {P}\left( {\tilde{Z}}^{(k)}(t)>0\right) \) be the probability that the branching process \(\{\tilde{ \mathbf {Z} }^{(k)}(t):t \ge 0 \}\) is not extinct at time t given that it starts with one individual of type k. Then we expect that arguments similar to those used in the proof of Heinzmann (2009, Theorem 3.1), will show that there exists constants \(\tilde{c}_1, \tilde{c}_2,\ldots ,\tilde{c}_{k_{\mathrm {max}}}\), satisfying \(0<\tilde{c}_k<\infty \), such for \(k=1,2,\ldots ,k_{\mathrm {max}}\),

$$\begin{aligned} {\tilde{q}}_k(t) =\tilde{c}_k \mathrm{e}^{\tilde{r}t}\left( 1+o(\mathrm{e}^{-\tilde{\gamma }t})\right) \quad \text{ as } t \rightarrow \infty , \end{aligned}$$
(7.10)

for any \(\tilde{\gamma }>0\). It then follows using (7.7) that

$$\begin{aligned} q_k(t)=1-\pi _k+\pi _k \tilde{c}_k \mathrm{e}^{\tilde{r}t}\left( 1+o(\mathrm{e}^{-\tilde{\gamma }t})\right) \quad \text{ as } t \rightarrow \infty . \end{aligned}$$
(7.11)

Heinzmann (2009, Theorem 3.1), cannot be applied directly as it assumes that the matrix \(\tilde{ \varvec{{\varOmega }} }\) is irreducible. We do not consider it here but we expect that Heinzmann’s proof can be extended to our situation. If we assume that type-0 individuals are dead and only consider initial individuals of types \(1,2,\ldots ,k_{\mathrm {max}}-1\) (recall that only initial infectives can have type \(k_{\mathrm {max}}\)) then \(\tilde{ \varvec{{\varOmega }} }\) becomes a \((k_{\mathrm {max}}-1) \times (k_{\mathrm {max}}-1)\) irreducible matrix. Heinzmann (2009, Theorem 3.1), then yields (7.10); note that now \(\pi _k\) is replaced by \({\bar{\pi }}_k=1-\lim _{t \rightarrow \infty } {\bar{q}}_k(t)\) \((k=1,2,k_{\mathrm {max}}-1)\) in the definition of \(\tilde{ \varvec{{\varOmega }} }\) and \({\tilde{q}}_k(t)={\tilde{q}}_k(t)=\mathbb {P}\left( \sum _{i=1}^{k_{\mathrm {max}}-1}{\tilde{Z}}^{(k)}_i(t)>0\right) \). The approximation (7.11) then holds with \(q_k(t)\) and \(\pi _k\) replaced by \({\bar{q}}_k(t)\) and \({\bar{\pi }}_k\), repsectively. Moreover, if we then let \(\tilde{ \mathbf {f} }_1^{\!\top }\) and \(\tilde{ \mathbf {b} }_1\) be left and right eigenvectors of \(\tilde{ \varvec{{\varOmega }} }\) corresponding to the eigenvalue \(\tilde{r}\), satisfying \(\tilde{ \mathbf {f} }_1^{\!\top }\tilde{ \mathbf {b} }_1=1\), then \(\tilde{c}_k=\left( \mathbf {u} _k^{\!\top }\tilde{ \mathbf {b} }_1\right) h^*\), where \(h^*=\lim _{t \rightarrow \infty } \mathrm{e}^{-\tilde{r}t}\tilde{ \mathbf {f} }_1^{\!\top }\tilde{ \mathbf {q} }(t)\) and \(\tilde{ \mathbf {q} }(t)=\left( {\tilde{q}}_i(t),{\tilde{q}}_2(t), \ldots , {\tilde{q}}_{k_{\mathrm {max}}-1}(t)\right) ^{\!\top }\). Unfortunately, unlike with \( \varvec{{\varOmega }} \), there do not appear to be closed-form expressions for \(\tilde{r}\) and its associated eigenvectors.

7.3 Early behaviour and matched asymptotics

Matched asymptotics is a standard technique in mathematical biology for writing down approximations to non-linear models that match known asymptotic behaviour (Murray 2002, 2003). While numerical solution of the ODEs (7.2) is efficient (as we have noted above) we now obtain a crude approximation to the full system that takes a closed form in terms of elementary functions.

First note that for \(k\in \mathscr {K}\), \(\pi _k(0) = 0\) and \(\pi _k(t)\) is monotonically increasing with t. If we neglect the quadratic terms in \(\pi \) in (7.2) then, since these are only positive and increasing over time, we get a lower bound for the extinction probabilities:

$$\begin{aligned} \pi _k(t) \ge \pi _k^{(0)}(t)= \frac{\gamma }{\omega _k} \left( 1 - \mathrm{e}^{-\omega _k t} \right) . \end{aligned}$$
(7.12)

Note that in standard matched asymptotics, we would identify a small parameter from a ratio of rate constants as the basis for a systematic approximation scheme (Murray 1984); an alternative would be to approximate systematically by, for example, letting \(\pi _k^{(1)} = \pi _k - \pi _k^{(0)}\), substituting into (7.2) and neglecting quadratic terms to give a linear set of equations for the next order of approximation. Here we consider only the lowest order approximation, and hence define an ‘internal’ solution for the survival probability as:

$$\begin{aligned} q_k^{(I)}(t) = 1-\pi _k^{(0)}(t) =\frac{1}{\omega _k} \left( \tau k + \gamma \mathrm{e}^{-\omega _k t} \right) . \end{aligned}$$
(7.13)

Next, supposing we are in the subcritical case so that our result (7.5) holds. We will call this the ‘external’ solution

$$\begin{aligned} q_k^{(E)}(t) = ck\mathrm{e}^{r t}. \end{aligned}$$
(7.14)

To fix the constant c, we match the late behaviour of the internal solution with the early behaviour of the external solution:

$$\begin{aligned} \overline{q} = \lim _{t\rightarrow \infty } q_k^{(I)}(t) = \lim _{t\rightarrow 0} q_k^{(E)}(t) \quad \Rightarrow \quad q_k^{(E)}(t) = \frac{\tau k}{\omega _k} \mathrm{e}^{r t} . \end{aligned}$$
(7.15)

Finally, the matched asymptotic solution is

$$\begin{aligned} q_k^{(A)}(t) = q_k^{(I)}(t) + q_k^{(E)}(t) - \overline{q} = \frac{1}{\omega _k} \left( \tau k \mathrm{e}^{r t} + \gamma \mathrm{e}^{-\omega _k t} \right) . \end{aligned}$$
(7.16)

We compared this approximation as well as the internal and external solutions to the exact solution \(q_k(t)\), with results shown in Fig. 1. As advertised, this is a relatively crude approximation, but is expressed in terms of elementary functions and satisfies known asymptotic limits.

Fig. 1
figure 1

Extinction probability for a subcritical epidemic compared to approximations. Results are for a 3-regular graph with \(\gamma = 1\) and values of \(\tau \) indicated in the figure titles. The internal and external solutions each fail severely at certain points, but the approximate solution crudely captures the overall behaviour (colour figure online)

8 Fluctuations in the emerging phase of a major outbreak

We now consider the early behaviour of supercritical epidemics that take off (i.e. do not go extinct early on but ultimately end owing to long-term depletion of susceptibles). The early stages of such an epidemic are approximated by the branching process defined in Sect. 2.3 but conditioned on non-extinction. It is straightforward to adapt the results on means and variances in Sects. 3 and 4 to condition on \(Z^{(k)}(t)>0\). Elementary calculation shows that, for \(t \ge 0\) and \(k\in \mathbb {N}_0\),

$$\begin{aligned} \mathbb {E}\left[ Z^{(k)}(t)\Big |Z^{(k)}(t)>0\right]&=\frac{\mathbb {E}\left[ Z^{(k)}(t)\right] }{q_k(t)},\\ \mathrm{var}\left( Z^{(k)}(t)\Big |Z^{(k)}(t)>0\right)&=\frac{\mathrm{var}\left( Z^{(k)}(t)\right) }{q_k(t)}-\pi _k(t)\left( \frac{\mathbb {E}\left[ Z^{(k)}(t)\right] }{q_k(t)}\right) ^2. \end{aligned}$$

Expressions for \(\mathbb {E}\left[ Z^{(k)}(t)\Big |Z^{(k)}(t)>0\right] \) and \(\mathrm{var}\left( Z^{(k)}(t)\Big |Z^{(k)}(t)>0\right) \) then follow using (3.15) and (4.4), respectively, though there is no closed-form formula for \(q_k(t)\) or \(\pi _k(t)\). Note that, assuming \(r>0\) so \(\pi _k<1\),

$$\begin{aligned} \lim _{t \rightarrow \infty }\frac{\mathrm{var}\left( Z^{(k)}(t)\Big |Z^{(k)}(t)>0\right) }{\mathbb {E}\left[ Z^{(k)}(t)\Big |Z^{(k)}(t)>0\right] ^2} =\left( 1-\pi _k\right) \lim _{t \rightarrow \infty }\frac{\mathrm{var}\left( Z^{(k)}(t)\right) }{\mathbb {E}\left[ Z^{(k)}(t)\right] ^2}-\pi _k , \end{aligned}$$

which depends on the degree k of the initial infective.

The diffusion approximation studied in Graham and House (2014) corresponds to the case where the number of infectives at time \(t=0\) is large. Return to the case where there is a maximum degree \(k_{\mathrm {max}}\) and suppose that the branching process does not go extinct. Then it follows from Athreya and Ney (1972, Theorem V.7.2), that, for any \(k\in \mathscr {K}\),

$$\begin{aligned} \frac{ \mathbf {Z} ^{(k)}(t)}{Z^{(k)}(t)} \xrightarrow {\text {a.s.}} \mathbf {w} _1 \quad \text {as } t \rightarrow \infty , \end{aligned}$$

where \( \mathbf {w} _1\), given by (3.10), is a left eigenvector of \( \varvec{{\varOmega }} \) corresponding to the dominant eigenvalue \(\lambda _1=r\). Thus if an epidemic takes off and is still in its exponentiallly growing phase then the relative frequencies of the different types of infectives will be close to \( \mathbf {w} _1\). Hence, we now assume that the initial number of individuals in the branching process \(Z(0)=K\), where K is large, and that \(Z_i(0) \approx w_{1,i} K\) for \(i \in \mathscr {K}\). Label the initial individuals \(1,2,\ldots ,K\). Then, for \(t \ge 0\), the total population size is \(Z(t)={\hat{Z}}_1(t)+ {\hat{Z}}_2(t)+\ldots +{\hat{Z}}_K(t)\), where \({\hat{Z}}_i(t)\) denotes the total number of descentants of the initial individual i that are alive at time t, including i itself if it is still alive. Thus, \(\mathbb {E}\left[ Z(t)\right] =\sum _{i=1}^K \mathbb {E}\left[ {\hat{Z}}_i(t)\right] \), for all \(t \ge 0\), and, since the processes \(\{{\hat{Z}}_i(t):t \ge 0\}\) \((i=1,2,\ldots ,K)\) are mutually independent, \(\mathrm{var}\left( Z(t)\right) =\sum _{i=1}^K \mathrm{var}\left( {\hat{Z}}_i(t)\right) \), for all \(t \ge 0\), and \(\mathrm{cov}\left( Z(t), Z(s)\right) = \sum _{i=1}^K \mathrm{cov}\left( {\hat{Z}}_i(t), {\hat{Z}}_i(s)\right) \), for all \(t,s \ge 0\).

Note that (3.9) implies that

$$\begin{aligned} \mathbf {w} _1^{\!\top } \mathbf {n} =\sum _{i=0}^{k_{\mathrm {max}}} i w_{1,i}=\mu _{{\tilde{D}}-2} \qquad \text{ and }\qquad \mathbf {w} _1^{\!\top } \mathbf {n} _2= \sum _{i=0}^{k_{\mathrm {max}}} i^2 w_{1,i}=\frac{\mu _{{\tilde{D}}-2} \mu _{({\tilde{D}}-1)^2+1}}{\mu _{{\tilde{D}}}} . \end{aligned}$$

Assuming that the above approximation is exact, then, for \(t \ge 0\), it follows from (3.15) that

$$\begin{aligned} \mathbb {E}\left[ Z(t)\right] =K \mathbf {w} _1^{\!\top } \mathbf {m} (t)=K \mathrm{e}^{rt} \end{aligned}$$
(8.1)

and, after a little algebra, it follows from (4.4) that

$$\begin{aligned} \mathrm{var}\left( Z(t)\right)&= K \mathbf {w} _1^{\!\top } \mathbf {v} (t)\nonumber \\&=K\left( \gamma \left[ I_9(t)+\mu _{{\tilde{D}}}^{-1}\mu _{{\tilde{D}}-2}^{-1}\left( \sigma _{{\tilde{D}}}^2+2\right) I_4(t)\right] \right. \nonumber \\&\quad \left. + \tau \mu _{{\tilde{D}}-2}^{-1}\left[ \mu _{{\tilde{D}}-2}^2 I_9(t)+\sigma _{{\tilde{D}}}^2 I_4(t)\right] \right) , \end{aligned}$$
(8.2)

where \(\sigma _{{\tilde{D}}}^2=\mathrm{var}({\tilde{D}})\) and

$$\begin{aligned} I_9(t)=\frac{\mathrm{e}^{rt}\left( \mathrm{e}^{rt}-1\right) }{r}. \end{aligned}$$

Comparison of (8.1) and (8.2) with the diffusion-based result of Graham and House (2014) in the limit of large t gives agreement when \(\gamma = 0\) (i.e. for the SI model) but not for \(\gamma >0\). We believe that this is due to the fact that the diffusion model was only four dimensional, so a heuristic argument (given in Sect. 3.3 of Graham and House 2014, which gave results that were in good agreement with simulation) about the neighbourhood of an infective node had to be made, in contrast to the approach here that deals with each effective degree explicitly and so has \(k_{\mathrm {max}}+1\) dimensions. The argument about the neighbourhood around an infective tries to account for correlations caused by variability in recovery times, and so if \(\gamma \rightarrow 0\) then these correlations do not exist.

Recent work by Constable and McKane (2014) considered the reduction of high-dimensional stochastic models to low-dimensional diffusions and this approach was shown to be asymptotically exact for some systems in the small-noise limit by Parsons and Rogers (2015). It is an open question whether the argument in Sect. 3.3 of Graham and House (2014) could be justified rigorously by a similar argument, however we note that a branching-process approach makes fewer assumptions than a low-dimensional diffusion limit and so will be more generally applicable.

Considering further results that can be obtained, it follows from (5.3) and a little algebra that, for \(0 \le t \le s\),

$$\begin{aligned} \mathrm{cov}\left( Z(t), Z(s)\right)&=\mathrm{e}^{-\gamma (s-t)} \mathrm{var}\left( Z(t)\right) +K\mu _{{\tilde{D}}-2}^{-1}\left( \mathrm{e}^{r(s-t)}-\mathrm{e}^{-\gamma (s-t)}\right) \nonumber \\&\quad \times \left\{ \gamma \mu _{{\tilde{D}}}^{-1}\left[ \mu _{({\tilde{D}}-1)^2+1}I_9(t)- \left( \sigma _{{\tilde{D}}}^2+2\right) I_6(t)\right] \right. \nonumber \\&\quad +\left. \tau \left[ \mu _{{\tilde{D}}-2}^2 I_9(t)-\sigma _{{\tilde{D}}}^2 I_6(t)\right] \right\} . \end{aligned}$$
(8.3)

It seems plausible that these results extend to the case when there is no maximal degree but that would involve results for countably infinite matrices which we do not consider here.

Recall that the processes \(\{{\hat{Z}}_i(t):t \ge 0\}\) \((i=1,2,\ldots ,K)\) are mutually independent. It follows using the central limit theorem that, for sufficiently large K, the process \(\{Z(t):t \ge 0\}\), which approximates the prevalence of infection during the early growth of an epidemic, is approximately Gaussian with mean function given by (8.1) and covariance function given by (8.3).

9 Numerical examples

9.1 Forward simulations

We conducted a series of numerical experiments to provide specific examples of the general results presented here. \(M=10^4\) Monte Carlo simulations were performed on three different configuration model networks, each of size \(N=10^4\), and with the degree distributions shown in Fig. 2 Row (a). (Note that each Monte Carlo simulation consisted of first simulating a network and then simulating a single epidemic on it.) Two different scenarios were considered. In the first – most commonly considered in the literature when simulations are compared to analytic approaches – \(\mathrm {time}=0\) was defined as the first point when prevalence is at a given level, K. In our simulations we took \(K=100\), but in general K should take a value where the probability of extinction has become negligible, but the depletion of the susceptible population has not had a significant effect on the epidemic dynamics. In the second, each epidemic was started from one node, picked uniformly at random, so the probability of extinction played a major role. This scenario is less commonly considered when comparing real-time simulated epidemics to differential equation models because the latter are typically designed to hold when the epidemic is already established.

Since analytic results for the probabilities of extinction \(\pi _k(t)\) are not available, the branching process results required numerical integration of ordinary differential equations (in our case using Runge–Kutta methods). We stress that the computational effort required to do this is much less than that involved in performing Monte Carlo simulations, and has the benefit of not depending on N.

The results for the first approach (restarting time at the first time prevalence reaches 100) are given in Fig. 2. Row (b) shows sample trajectories (which all agree on prevalence at time 0). Row (c) shows the simulated mean after time 0 on a logarithmic scale, which initially grows at the constant rate predicted by the branching process model, and then reduces as the susceptible population is depleted. Row (d) shows the variance, which has not converged to its asymptotic growth regime by the time prevalence is equal to 100, an effect that is captured by the branching process model. The variance does not take its largest value at the peak prevalence, but instead has local maxima before and after the peak.

Fig. 2
figure 2

Epidemic simulations that set \(\mathrm {time}=0\) when prevalence is equal to 100. Parameters are \(\tau =2\), \(\gamma =1\) throughout. a Degree distribution histograms, b 100 sample trajectories, c Mean prevalence. Black solid simulations; Red dashed branching process d Variance in prevalence. Black solid simulations; Red dashed branching process (colour figure online)

Fig. 3
figure 3

Epidemic simulations starting from one node selected uniformly at random. Parameters are \(\tau =2\), \(\gamma =1\) throughout. Degree distributions are as for Fig. 2 above. a 100 sample trajectories, b Extinction probabilities, Black solid simulations; Red dashed branching process, c Mean prevalence. Black solid simulations; Red dashed branching process, d Variance in prevalence. Black solid simulations; Red dashed branching process (colour figure online)

Figure 3 shows the results for the second approach in which there is one randomly chosen initial infective at time 0. Row (a) shows some sample trajectories. Row (b) shows the extinction probabilities, which are accurately captured in the branching process model until very close to the end of the epidemic when prevalence is low and extinction becomes likely again. Row (c) shows the mean, which does not start growing at a constant rate with the convergence rate accurately captured in the branching process model. Row (d) shows the variance and convergence onto its asymptotic value; in this case there is a single maximum just before the peak in prevalence.

Another important point is that while mean numbers infected are comparable between Figs. 2 and 3, the variability in the time for the epidemic to take off, as well as the contribution from extinct epidemics, makes the real-time variance in Fig. 3 orders of magnitude larger than in Fig. 2.

9.2 Statistical inference

In order to demonstrate the potential use of the real-time effective degree branching process model for statistical inference, we carried out a simulation study. Here we simulated one epidemic that took off on a configuration model network of size \(N=10^6\) with degree distribution \(D^{(3)}\) as in the right-hand column of Fig. 2 (\(d^{(3)}_1 = 1/8\), \(d^{(3)}_3 = 5/6\), \(d^{(3)}_9 = 1/24\)) and true rates \({\tau }_0 = 2\), \({\gamma }_0 = 1\). Letting I(t) be the prevalence of infection in the network model, we set time \(t=0\) when \(I(t)=100\) for the first time and make 40 evenly-spaced observations (with gap \(\delta t = 0.05\) between each) of I(t) up to \(t_{\mathrm {end}}=2\).

We then define an approximate likelihood based on the methods of Ross et al. (2006), in which a Gaussian process approximation based on known first and second moments is used, which will be more accurate for larger N, larger I(0) and smaller \(\delta t\). There should be, however, no a priori obstacle to fitting our model to data on smaller populations even with incomplete data, for example by using Markov chain Monte Carlo methods to perform multiple imputation as suggested by O’Neill and Roberts (1999).

Explicitly, we let the probability density function f for sequential observations be given by

$$\begin{aligned} f(I(t+\delta t)|I(t)) = \mathscr {N}(\mathbb {E}[Z(t + \delta t) | Z(t)=I(t)] , \text {var}(Z(t + \delta t) | Z(t)=I(t)) ,\nonumber \\ \end{aligned}$$
(9.1)

where \(\mathscr {N}(m,V)\) is the probability density function of a normal distribution with mean m and variance V, and the expectation and variance of \(Z(t+\delta t)|Z(t)=I(t)\) are given by the results of Sect. 8 above. The likelihood is then

$$\begin{aligned} L = \prod _{t\in \{0,\delta t, \ldots , t_{\mathrm {end}}-\delta t\}} f(I(t+\delta t)|I(t)) . \end{aligned}$$
(9.2)

We consider values of this likelihood across the range of rate constant parameters \(\tau \) and \(\gamma \) under two different degree distributions: the correct one, \(D^{(3)}\), and a misspecified degree distribution \(D^{(1)}\), which is the one used in the left-hand column of Fig. 2 (\(d^{(1)}_3 = 1\)).

Figure 4 shows the first quarter of the simulated epidemic together with the Gaussian process approximation, as well as likelihood surfaces for the correct and misspecified degree distributions. Performing maximum likelihood estimation using MATLAB’s mle() function allows us to obtain point estimates for parameters \({\hat{\tau }}\) and \({\hat{\gamma }}\), as well as asymptotic 95% confidence intervals and the parameter covariance matrix \({\hat{C}}\) from the inverse Hessian. We quote results to 2 significant figures; the asymptotic approximations also give very slightly negative lower confidence intervals for \({\hat{\gamma }}\) which we round up to 0. For the correct degree distribution we obtain

$$\begin{aligned} \begin{pmatrix} {{\hat{\tau }}}^{(3)}\\ {\hat{\gamma }}^{(3)} \end{pmatrix} = \begin{pmatrix} 1.9 &{}\quad [1.4,2.4]\\ 0.8 &{}\quad [0,1.7] \end{pmatrix} , \qquad \hat{\varvec{C}}^{(3)} = \begin{pmatrix} 0.069 &{}\quad 0.11\\ 0.11 &{}\quad 0.19 \end{pmatrix} , \end{aligned}$$
(9.3)

and for the misspecified degree distribution we obtain

$$\begin{aligned} \begin{pmatrix} {{\hat{\tau }}}^{(1)}\\ {\hat{\gamma }}^{(1)} \end{pmatrix} = \begin{pmatrix} 3.2 &{}\quad [2.3,4.0]\\ 0.8 &{}\quad [0,1.7] \end{pmatrix} , \qquad \hat{\varvec{C}}^{(1)} = \begin{pmatrix} 0.0045 &{}\quad 0.011\\ 0.011 &{}\quad 0.030 \end{pmatrix} . \end{aligned}$$
(9.4)

This shows that knowledge of the correct distribution allows both \(\tau \) and \(\gamma \) to be estimated; although as would be expected the early asymptotic growth rate r is much more closely constrained by simulated data than other directions in parameter space. It also shows that misspecification of the degree distribution allows r to be identified, but biases the estimate of, in this case, \(\tau \).

Fig. 4
figure 4

Simulation study. The top plot a shows the first quarter of the timepoints a Simulated epidemic, b Likelihood surfaces (observations as black dots, full trajectory as black solid line, Gaussian approximation mean as red dashed line, Gaussian approximation 95% prediction interval as red dotted line). The bottom plots b show likelihood surfaces for a Gaussian approximation model as described in Sect. 9.2. True parameters are \({\tau _0}=2\), \({\gamma _0}=1\). Correct degree distribution \(D^{(3)}\) is as given in the third columns of Figs. 2 and 3 above, and the misspecified distribution \(D^{(1)}\) is the distribution from the first column. Likelihood at a point is proportional to the intensity of shading, and with three straight lines, corresponding to the true values of \(\tau , \gamma \) and r, are shown each plot as dashed red lines (color figure online)

10 Concluding comments

10.1 Summary of results

In this paper, we have provided explicit closed-form expressions for the real-time mean, variance and covariance function for disease prevalence during the early stages of the Markovian SIR model on a configuration model network, as well as deriving differential equations for the probabilities of extinction over time that are relatively numerically cheap to solve. These allow for a more explicit treatment of e.g. rates of convergence to asymptotic behaviour than has previously been possible.

10.2 Future directions

We believe that the methods of real-time, multitype branching processes could be more widely applied in infectious disease epidemiology, since they provide results concerning extinction and variance that are not available using deterministic differential equation models. For example, the effective-degree based methodology presented here may be extended to include degree correlation (e.g. in the sense of Newman 2002) by keeping track of the actual, as well as effective, degrees of individuals, though the type space becomes larger and explicit analytic results are unlikely to be available. We note that there is increasing interest in the eradication of infections (e.g. Klepac et al. 2013) and that arguably calculating extinction probabilities and variability in outbreak sizes is of equal or greater importance in this context than calculation of mean behaviour.

The explicit closed-form expressions derived have the potential to enhance statistical work on epidemic prevalence curves. In particular, many empirically observed epidemics of human pathogens exhibit more variability around the trend than simple models would predict (see Black et al. 2014, particularly Section 1, for a discussion of this), which can bias parameter estimation if an insufficiently variable model is used. Application of our methods to real data would be an interesting extension of our work.

The possibility of a more general non-Markovian stochastic epidemic being approximated by an appropriate real-time branching process is raised by the results of Barbour and Reinert (2013) and it would be interesting to investigate whether our analysis could be adapted to this scenario.

Finally, there is the question of low-dimensional PGF-based modelling of the whole network epidemic that incorporates stochasticity accurately. For example, the work of Miller (2014) considered accounting for early fluctuations and Graham and House (2014) considered a diffusion approximation once early fluctuations were negligible, but the results presented here as well as those of Barbour and Reinert (2013) suggest that a more unified low-dimensional stochastic approach that explicitly models early fluctuations may be possible.