1 Introduction

A classical approach to epidemiological modelling is by the use of ordinary differential equations in which the population is divided into different compartments; see for instance [2] for a historical expose. Those equations attempt to capture empirical dynamics. The scaling is however not always clear; for instance it can be a question of whether the involved variables describe fractions or densities of individuals in the different compartments. Scaling is an inherent problem for ordinary differential equations describing population dynamics or epidemiological evolution of small population sizes whence the individuals are discrete quantities in contrast to the solutions of the ordinary differential equations.

Another inherent problem for ordinary differential description, at least for small populations, is that prediction error is not taken into consideration, whence models that involve randomness is looked for. An often used stochastic approach is to the right-hand side of an epidemiological ordinary equation add a noise representing future uncertainty or a random environment. Such equation is then formulated as a stochastic differential equation. The added noise is then usually of such a type that global existence and uniqueness of a solution can be obtained, see for instance [1, 3, 4, 6].

In [5, 1114], the modeling approach is instead based on birth-death type continuous time discrete-valued Markov chains of a certain parameter dependent structure directly applicable to compartment models with a constant total population size where the parameter is the population size. The scaled Markov chains then describe the fraction of individuals in each compartment. For huge population sizes the scaled Markov chains can be approximated by classical epidemiological ordinary differential equations and for intermediate population sizes the scaled Markov chains can be approximated by stochastic differential equations with square root diffusion terms for which a solution of such an equation is defined until the first hitting time of the axes whereafter in [1114], the solution of the stochastic differential equation is killed meaning that the stochastic differential equation approximation is not applied after such a hitting time since the epidemiological dynamics may afterwards need another model. By this continuous time Markov-chain approach with constant total population size, the interpretation of the variables of the limiting classical epidemiological ordinary differential equation is clarified as the fraction of individuals in each compartment.

In this paper the population size, due to deaths or migration, is allowed to change by time which means that the population size can not be used a scaling parameter directly applying results in [5, 1114]. Instead, it is assumed that the number of individuals in each compartment in a region with a given area is described as birth-death type continuous time discrete-valued Markov chain with the area size serving as a scaling parameter. The scaled Markov chain then describes the areal density of individuals in each compartment. Here the scaled time-continuous Markov chains can be approximated by classical epidemiological ordinary differential equations for large region areas and by approximative stochastic differential equations for intermediate region sizes. It means that here, the involved variables of the limiting classical epidemiological ordinary differential equation describe densities of individuals in each compartment. The interpretation of the involved parameters then differs somewhat from the interpretation of the parameters for a population size scaled time-continuous Markov chain.

In Sect. 2, birth-death type continuous-time Markov chain modeling is presented. In Sect. 3, a critical part for existence and uniqueness of a solution of the continuous-time Markov chain is pointed out. In Sect. 4, an initial reproduction number under the population size scaling with constant population size is compared with an initial reproduction number under area scaling. In Sect. 4, convergence of scaled time-continuous Markov-chains to corresponding solutions of ordinary differential equations is motivated. In Sect. 6, diffusion approximation for the case with no emigration (but allowed immigration and births and deaths) of area scaled continuous-time Markov chains is considered together with a limiting result of the probability for the diffusion approximation to be outside a box \((\epsilon ,M)^{3}\), \(0<\epsilon <M\), before a horizon time T. Also in Sect. 6, the diffusion approximation is compared to stochastic perturbation of parameters; it means a comparison between an approximative stochastic differential equation of a continuous time Markov-chain and a stochastic differential equation for which the noise is added to right-hand side of an ordinary differential equation. In Sect. 7, numerical simulations of area-scaled continuous time Markov chains, their diffusion approximations and numerical solution of corresponding ordinary differential equations are illustrated under different values of the parameters describing emigration, immigration, births and deaths. In an additional appendix the situation where all mentioned parameters are included is presented.

2 Time-continuous Markov-chain epidemiological modelling

A typical introductory example in epidemic modelling is the so-called SIR model, suitable for huge populations,

$$ \frac {d}{dt}\left [ \textstyle\begin{array}{c} s(t) \\ i(t) \\ r(t)\end{array}\displaystyle \right ] =\left [ \textstyle\begin{array}{c}-\beta s(t)i(t)\\ \beta s(t) i(t)-\gamma i(t) \\\gamma i(t)\end{array}\displaystyle \right ], $$
(1)

where \(s(t)\), \(i(t)\), and \(r(t)\) are fractions of susceptible, infected and recovered individuals, and the parameters β and γ represent a transmission rate and a recovery rate respectively (see e.g, [2]). Here, the recovered are, for simplicity, supposed to never be infected again.

Applying a general setup in [7, 1014], a natural model for small populations is,

$$\begin{aligned} \left [ \textstyle\begin{array}{c} s_{n}(t) \\ i_{n}(t) \\ r_{n}(t)\end{array}\displaystyle \right ] =&\left [ \textstyle\begin{array}{c} s_{n}(0) \\ i_{n}(0) \\ r_{n}(0)\end{array}\displaystyle \right ]+ \frac {1}{n}N_{-1,1,0}\bigg(\int _{0}^{t} n\beta s_{n}(u)i_{n}(u)du \bigg)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{n}N_{0,-1,1}\bigg(\int _{0}^{t} n\gamma i_{n}(u)du \bigg)\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ], \end{aligned}$$
(2)

where n is the total population size, \(s_{n}(t)\), \(i_{n}(t)\), and \(r_{n}(t)\) representing again the fractions of susceptible, infected, and recovered. The processes \(\{N_{-1,1,0}(t)\}_{t\geq 0}\) and \(\{N_{0,-1,1}(t)\}_{ t\geq 0}\) are two independent standard Poisson processes, each responsible for one of the two types of discrete events in continuous time, namely, transmissions and recoveries. The term

$$ N_{-1,1,0}\bigg(\int _{0}^{t} n\beta s_{n}(u)i_{n}(u)du\bigg), $$

represents the modelled number of infected individuals during the interval \([0,t]\). This can be explained by assuming that the intensity of the number of infected individuals is supposed to be proportional to the population size n and the fractions of susceptible and infected individuals, respectively. One possibility is to identify β as the product of an intensity k of pairwise interactions that a single individual has and a probability p that a susceptible-infective encounter leads to transmission i.e. \(\beta = kp\). The average number of contacts that an individual has per time unit is then k and the probability that a meeting between a susceptible and an infective results an infection is p. The intensity with which a single susceptible is infected is then k times the empirical probability \(i_{n}\) that the other one is infected times the probability p that the other one will infect. This means that the intensity with which a single susceptible is infected is \(k i_{n}p=\beta i_{n}\). The total intensity of interactions leading to infection is then \(\beta i_{n}\) times the number of susceptible \(ns_{n}\), namely, \(\beta i_{n} ns_{n}\).

The term

$$ N_{0,-1,1}\bigg(\int _{0}^{t} n\gamma i_{n}(u)du\bigg), $$

represents the modelled number of recovered individuals during the interval \([0,t]\). This can be explained by the fact that the intensity at which recoveries occur is proportional to the number \(ni_{n}\) of infected individuals. The coefficient γ is then the intensity at which one single individual recovers, i.e., the time to recovery is exponentially distributed with a mean of \(1/\gamma \).

The system (1) is conservative in the sense that if \(s(0)+i(0)+r(0)=1\) then indeed \(s(t)+i(t)+r(t)=1\) for \(t>0\) which means that for \(s(t)\), \(i(t)\), and \(r(t)\) to be fractions indeed make sense. Similarly the system (2) is conservative in the sense that if \(s_{n}(0)+i_{n}(0)+r_{n}(0)=1\), then also \(s_{n}(t)+i_{n}(t)+r_{n}(t)=1\) for \(t>0\).

The system (1) can be seen as the limiting case of (2) as \(n\to \infty \). In fact, for any finite \(T>0\), under the assumption

$$ (|s_{n}(0)-s(0)|^{2}+|i_{n}(0)-i(0)|^{2}+|r_{n}(0)-r(0)|^{2})^{1/2}=O(1/ \sqrt{n}), $$

we have, almost surely,

$$ \sup _{t\in [0,T]}(|s_{n}(t)-s(t)|^{2}+|i_{n}(t)-i(t)|^{2}+|r_{n}(t)-r(t)|^{2})^{1/2}=O(1/ \sqrt{n}), $$
(3)

which follows by applying [13, Theorem 2.2] or [7, Chap. 11, Theorem 2.1] (where it was used local Lipchitz continuity of the coefficients and that \((s(t),i(t),r(t))\) only has positive elements). A finer convergence result, expressed in terms of probabilities, can be found by applying [14, Theorem 1.1]. One can thus see (2) as an evolution model for the fractions in the different compartments and (1) as an approximation that can also be explained as the limit of (1). To get an intuition of why (3) holds, a compensated Poisson process representation of \((s(t),i(t),r(t))\) is presented in Sect. 3.

The SIR model (1) is easily extended, for instance to

$$ \frac {d}{dt}\left [ \textstyle\begin{array}{c} s(t) \\ i(t) \\ r(t)\end{array}\displaystyle \right ] =\left [ \textstyle\begin{array}{c}\Lambda -\beta s(t)i(t)+\alpha _{1}s(t)\\ \beta s(t) i(t)-\gamma i(t)+\alpha _{2}i(t) \\\gamma i(t)+\alpha _{3}r(t)\end{array}\displaystyle \right ], $$
(4)

where Λ represents a migration rate and \(\alpha _{1}\), \(\alpha _{2}\), \(\alpha _{3}\) represent rate differences between births and deaths. With \(s(0)+i(0)+r(0)=1\), then if \(-\Lambda =\alpha _{1}=\alpha _{2}=\alpha _{3}\), then \(s(t)+i(t)+r(t)=1\).

For non-conservative SIR models for which \(s(0)+i(0)+r(0)=1\) may not imply \(s(t)+i(t)+r(t)=1\) for \(t>0\), for instance if \(-\Lambda =\alpha _{1}=\alpha _{2}=\alpha _{3}\) in (4) is not valid, then an interpretation of \(s(t)\), \(i(t)\), and \(r(t)\) as fractions are in doubt. If \(\Lambda <0\) and \(\underline{\alpha}=\min (\alpha _{1},\alpha _{2},\alpha _{3})<0\) then the sum of \(s(t)\), \(i(t)\), and \(r(t)\) is even negative for \(t> -\ln (\Lambda /(\Lambda +\alpha ))/\underline{\alpha}\). However, if \(\Lambda \geq 0\), and \(s(0)\), \(i(0)\), and \(r(0)\) are all positive, by standard arguments \(s(t)\), \(i(t)\), and \(r(t)\) are all positive (for \(t<\tau =\inf \{u>0:s(u)i(u)r(u)=0\}\), \(r'(t)\geq \alpha _{3}r(t)\), \(i'(t)\geq (\alpha _{2}-\gamma )r(t)\) and \(s'(t)\geq -\beta r(t)s(t)\) implying \(\tau =\infty \)).

Similarly, a finite population discrete event continuous time system such as (2) but with varying population size due to the migration needs some contemplation. Instead of scaling with respect to a varying population size we suggest considering a region with area A and let \(s_{A}\), \(i_{A}\) and \(r_{A}\) be the densities, i.e., the numbers in the three compartments divided by A. Here it is natural to express the migration rate Λ as the difference rates between immigration and emigration rates \(\Lambda =\Lambda ^{+}-\Lambda ^{-}\) and similarly it is natural to express the differences between birth rates and death rates explicitly, \(\alpha _{i}=\lambda _{i}-\mu _{i}\), \(i=1,2,3\).

Then the model is, for all \(t< \tau _{A}\),

$$\begin{aligned} \left [ \textstyle\begin{array}{c} s_{A}(t) \\ i_{A}(t) \\ r_{A}(t)\end{array}\displaystyle \right ] =&\left [ \textstyle\begin{array}{c} s_{A}(0) \\ i_{A}(0) \\ r_{A}(0)\end{array}\displaystyle \right ]+ \frac {1}{A}N_{-1,1,0}\bigg(\int _{0}^{t} A\beta s_{A}(u)i_{A}(u)du \bigg)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{A}N_{0,-1,1}\bigg(\int _{0}^{t} A\gamma i_{A}(u)du \bigg)\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{A}\biggl\{ N_{1,0,0}^{+,mig}\bigg(\int _{0}^{t} A \Lambda ^{+} du\bigg)-N_{1,0,0}^{-,mig}\bigg(\int _{0}^{t} A\Lambda ^{-} du\bigg) \biggr\} \left [ \textstyle\begin{array}{r} 1 \\ 0 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{A}\biggl\{ N_{1,0,0}^{+}\bigg(\int _{0}^{t} A \lambda _{1} s_{A}(u)du\bigg)-N_{1,0,0}^{-}\bigg(\int _{0}^{t} A\mu _{1} s_{A}(u)du\bigg)\biggr\} \left [ \textstyle\begin{array}{r} 1 \\ 0 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{A}\biggl\{ N_{0,1,0}^{+}\bigg(\int _{0}^{t} A \lambda _{2} i_{A}(u)du\bigg)-N_{0,1,0}^{-}\bigg(\int _{0}^{t} A\mu _{2} i_{A}(u)du\bigg)\biggr\} \left [ \textstyle\begin{array}{r} 0 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{A}\biggl\{ N_{0,0,1}^{+}\bigg(\int _{0}^{t} A \lambda _{3} r_{A}(u)du\bigg)-N_{0,0,1}^{-}\bigg(\int _{0}^{t} A\mu _{3} r_{A}(u)du\bigg)\biggr\} \left [ \textstyle\begin{array}{r} 0 \\ 0 \\ 1\end{array}\displaystyle \right ], \end{aligned}$$
(5)

where independent Poisson processes \(N_{1,0,0}^{+,mig}\) (mig as in migration), \(N_{1,0,0}^{-,mig}\), \(N_{1,0,0}^{+}\), … \(N_{0,0,1}^{+}\) are added, and

$$ \tau _{A}=\inf \{t> 0: s_{A}(t)-A^{-1}N_{1,0,0}^{-,mig}(A \Lambda ^{-}t)< 0\}. $$
(6)

It means that \(s_{A}(t)\), \(i_{A}(t)\), and \(r_{A}(t)\) are not negative for \(t<\tau _{A}\).

The term

$$ N_{-1,1,0}\bigg(\int _{0}^{t} A\beta s_{A}(u)i_{A}(u)du\bigg) $$

is the number of interactions in the region that leads to infection in the region. The rate with which infection occurs is proportional to the area A, the concentration \(s_{A}\) of susceptible, and the concentration \(i_{A}\) of infected. This is similar to chemical reaction modelling, but here with areas instead of volumes. The quicker the movements of individuals the bigger β, and the quicker the infection is spread.

A more specific interpretation of β can be to think that around each infective individual there is a region, a ‘contagion region’, such that a meeting with a susceptible individual within that region leads to infection with intensity α, i.e. a meeting with a susceptible individual in such a region leads to transmission with probability \(\alpha dt\) within an infinitesmal time period of length dt. With a the mean area of such a contagion region we let \(\beta =a\alpha \). With \(s_{A}\) the number of susceptible per area, within the ‘contagion region’ the expected number of interactions with a susceptible individual is \(as_{A}\). The intensity of infections that an infective infects is \(as_{A}\alpha =\beta s_{A}\). With \(Ai_{A}\) the total number of infective, the total intensity of infections is \(\beta s_{A} A i_{A}\).

Parameter β can then be seen as a time-space intensity of interactions that lead to infection.

Each of the other terms can be explained separately, for instance,

$$ N_{1,0,0}^{+,mig}\bigg(\int _{0}^{t} A\Lambda ^{+} du\bigg)=N_{1,0,0}^{+,mig}(A \Lambda ^{+} t) $$

is the number of immigrated individuals to the region in \([0,t]\); \(\Lambda ^{+}\) is the immigration rate per area unit and \(A\Lambda ^{+}\) is the immigration rate to the region.

Also for instance the term

$$ N_{1,0,0}^{+}\bigg(\int _{0}^{t} A\lambda _{1} s_{A}(u)du\bigg) $$

is the number of born susceptible in \([0,t]\) with each susceptible giving births with intensity \(\lambda _{1}\), \(As_{A}(u)\) the number of susceptible in the region at time u and \(A\lambda _{1} s_{A}\)(u) the total birth intensity of those individuals at time u. The term \(N_{1,0,0}^{-}\bigg(\int _{0}^{t} A\mu _{1} s_{A}(u)du\bigg)\) is likewise the number of dead susceptible in \([0,t]\) with each susceptible having death intensity \(\mu _{1}\), \(As_{A}(u)\) the number of susceptible in the region at time u and \(A\mu _{1} s_{A}\)(u) the total death intensity of those individuals at time u.

For \(t>\tau _{A}\), the dynamics have to be reformulated since we should obviously have \(\min (s_{A}(t),i_{A}(t),r_{A}(t))\geq 0\). The critical term is, for \(\Lambda ^{-}>0\), \(N_{1,0,0}^{-,mig}\bigg(\int _{0}^{t} A\Lambda ^{-} du\bigg)\) for which the intensity is positive even though \(s_{A}(t)\) may be zero. The other terms will have zero intensity at time t if \(s_{A}(t)\), \(s_{A}(t)\), or \(s_{A}(t)\) are zero.

3 Existence and uniqueness of a solution to (5)

Existence and uniqueness of a solution to (5) is shown iteratively, first up to the first jump time given the involved Poisson processes. Until the first jump time the process is constant. Then we get the solution up to the second jump time, observing again that the process is constant between the jump times. It is critical to show that it will not be infinite intensity in finite time. That can be secured by Gronwall’s inequality. By adding the lines in model (5), with \(n_{A}(t)=s_{A}(t)+i_{A}(t)+i_{A}(t)\) where all the terms are non-negative for \(t<\tau _{A}\), we get

$$\begin{aligned} n_{A}(t) =&n_{A}(0)+ \frac {1}{A}\biggl\{ N_{1,0,0}^{+,mig}\bigg(\int _{0}^{t} A\Lambda ^{+} du\bigg)-N_{1,0,0}^{-,mig}\bigg(\int _{0}^{t} A\Lambda ^{-} du\bigg)\biggr\} \\ &+\text{ }\frac {1}{A}\biggl\{ N_{1,0,0}^{+}\bigg(\int _{0}^{t} A \lambda _{1} s_{A}(u)du\bigg)-N_{1,0,0}^{-}\bigg(\int _{0}^{t} A\mu _{1} s_{A}(u)du\bigg)\biggr\} \\ &+\text{ }\frac {1}{A}\biggl\{ N_{0,1,0}^{+}\bigg(\int _{0}^{t} A \lambda _{2} i_{A}(u)du\bigg)-N_{0,1,0}^{-}\bigg(\int _{0}^{t} A\mu _{2} i_{A}(u)du\bigg)\biggr\} \\ &+\text{ }\frac {1}{A}\biggl\{ N_{0,0,1}^{+}\bigg(\int _{0}^{t} A \lambda _{3} r_{A}(u)du\bigg)-N_{0,0,1}^{-}\bigg(\int _{0}^{t} A\mu _{3} r_{A}(u)du\bigg)\biggr\} ,\\ \leq & n_{A}(0)+ \frac {1}{A}\biggl\{ N_{1,0,0}^{+,mig}\bigg(A \Lambda ^{+} t\bigg)+N_{1,0,0}^{-,mig}\bigg(A\Lambda ^{-}t\bigg) \biggr\} \\ &+\text{ }\frac {1}{A}\biggl\{ N_{1,0,0}^{+}\bigg(\int _{0}^{t} A \lambda _{1} n_{A}(u)du\bigg)+N_{1,0,0}^{-}\bigg(\int _{0}^{t} A\mu _{1} n_{A}(u)du\bigg)\biggl\} \\ &+\text{ }\frac {1}{A}\biggl\{ N_{0,1,0}^{+}\bigg(\int _{0}^{t} A \lambda _{2} n_{A}(u)du\bigg)+N_{0,1,0}^{-}\bigg(\int _{0}^{t} A\mu _{2} n_{A}(u)du\bigg)\biggr\} \\ &+\text{ }\frac {1}{A}\biggl\{ N_{0,0,1}^{+}\bigg(\int _{0}^{t} A \lambda _{3} n_{A}(u)du\bigg)+N_{0,0,1}^{-}\bigg(\int _{0}^{t} A\mu _{3} n_{A}(u)du\bigg)\biggr\} ,\, \, t\leq \tau _{A}, \end{aligned}$$

which by a stochastic version of Gronwall’s inequality, see e.g., [13, Appendix] implies that \(\sup _{t<\tau _{A}}E[n_{A}(t)]e^{-\alpha t}\) is finite for some \(\alpha >0\) which in its turn implies finite total intensity of jumps for \(n_{A}(t)\) and therefore for \(s_{A}(t)\), \(i_{A}(t)\), and \(r_{A}(t)\). For more general existence and uniqueness result of equations of the type (5), see for instance [7, Chap. 6, Thorem 4.1].

4 \(\mathcal{R}_{0}\) under the population size scaling and density scaling

We consider the number K of individuals that a single initial infected infects where all others are initially susceptible under the population size scaling (2) and the density scaling (5), respectively. We condition on the time τ until recovery of the first individual being exponential distributed with mean \(1/\gamma \).

4.1 Population size scaling

For the model (2), the intensity with which that single individual infects is the number of pairwise interactions times the probability that the other individual is susceptible times the empirical probability that the other met individual is susceptible, i.e. \(kps_{n}=\beta s_{n}\). Since initially only one individual is infected, \(i_{n}(0)=1/n\), so K is approximately Po\((\int _{0}^{\tau }\beta s_{n}(u)du)\). With \(\mathcal{R}_{0}=E[K]\),

$$\begin{aligned} E[K] =&\int _{0}^{\infty }E[K|\tau =t]\gamma e^{-\gamma t}dt\approx \int _{0}^{\infty }\int _{0}^{t}\beta E[s_{n}(u)]du\gamma e^{-\gamma t}dt \\ \approx &\int _{0}^{\infty }\int _{0}^{t}\beta s_{n}(0)du\gamma e^{- \gamma t}dt \\ =& \beta (1-1/n)\int _{0}^{\infty }t\gamma e^{-\gamma t}dt\approx {\beta}/\gamma , \end{aligned}$$

for n large as for the standard \(\mathcal{R}_{0}\) for (1) obtained by the next generation method [15]. Knowing that the average time until recovery is \(1/\gamma \) it means that the mean intensity of infections that the first individual creates is \(\beta =kp\), the product of the mean number of interactions the first individual has and the probability that such a meeting leads to a new infective.

4.2 Spatial scaling

For the model (5), the intensity of infections that a single infective infects where all others are susceptible is \(\beta s_{A}\) where we here recall that \(\beta =a\alpha \), where a is the mean ‘contagion region’ for an infective and α is the intensity for which a meeting that an infective has with a susceptible resulting into a new infected. For a comparison with (2) we consider the case \(\Lambda ^{+}=\Lambda ^{-}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\). Then during the time τ, the number of infections that the first infective infects is, also here approximately Po\((\int _{0}^{\tau }\beta s_{A}(t)dt)\). Here \(As_{A}(0)\) is the initial number of susceptible in the region with area A. Then

$$\begin{aligned} E[K|s_{A}(0)] =&E[E[K|\tau ]|s_{A}(0)]=\int _{0}^{\infty }E[K|\tau =t,S_{A}(0)] \gamma e^{-\gamma t}dt \\ \approx &\int _{0}^{\infty }\int _{0}^{t}\beta E[s_{A}(u)|s_{A}(0)]du \gamma e^{-\gamma t}dt \\ \approx & \int _{0}^{\infty }\int _{0}^{t}\beta E[s_{A}(0)|s_{A}(0)]du \gamma e^{-\gamma t}dt \\ =&\beta s_{A}(0) \int _{0}^{\infty }t\gamma e^{-\gamma t}dt= s_{A}(0){ \beta}/\gamma . \end{aligned}$$

During the infection time with mean \(1/\gamma \) the intensity with which that single infective infects is hence approximately \(s_{A}(0)\beta =as_{A}(0)\alpha \), where \(as_{A}(0)\) is approximately the approximate mean number of susceptible in a contagion region around the infective of which those susceptible will be infected with intensity α. If the initial susceptible are initially uniformly spread with intensity of ρ individuals per area unit, i.e. the number of the susceptible in a region with area A is Po\((\rho A)\), then the intensity with which that single infective infects is approximately \(\rho \beta =a\rho \alpha \), the mean contagion region times the density times the intensity that a meeting with a susceptible leads to infection. During the infection time with mean \(1/\tau \) the mean number of infected is then \(E[K]\approx \rho \beta /\gamma \).

5 ODE approximation

Similar to (3), we can, by applying the same references as for the derivation of (3), under the assumption

$$ (|s_{A}(0)-s(0)|^{2}+|i_{A}(0)-i(0)|^{2}+|r_{A}(0)-r(0)|^{2})^{1/2}=O(1/ \sqrt{A}), $$

for the case \(\Lambda ^{-}=0\), derive

$$ \sup _{t\in [0,T]}(|s_{A}(t)-s(t)|^{2}+|i_{A}(t)-i(t)|^{2}+|r_{A}(t)-r(t)|^{2})^{1/2}=O(1/ \sqrt{A})\qquad \text{a.s.} $$
(7)

where \((s_{A},i_{A},r_{A})\) and \((s,i,r)\) are the solutions of (5) and (4), respectively. A limiting result similar to (7) for the case \(\Lambda ^{-}>0\) needs to be modified including stopping times of the type \(\tau ^{\epsilon}=\inf \{t>0: s(t)<\epsilon \}\) for \(\epsilon >0\) as well as a corresponding stopping time for (5). A limiting result for the case \(\Lambda ^{-}>0\) is therefore beyond the scope of this paper.

To get an intuitive understanding of (7), we can make a compensated Poisson representation of (5). In order to get not too long formulae, we consider the case for which also \(\Lambda ^{+}=\Lambda ^{-}= \lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\). The proof is almost identical for the general case, and can be found in the Appendix. With the compensated standard Poisson processes

$$ \tilde{N}_{-1,1,0}(t)=N_{-1,1,0}(t)-t,\qquad \tilde{N}_{0,-1,1}(t)=N_{0,-1,1}(t)-t, $$

we can then write (1) as

$$\begin{aligned} \left [ \textstyle\begin{array}{c} s_{A}(t) \\ i_{A}(t) \\ r_{A}(t)\end{array}\displaystyle \right ] =&\left [ \textstyle\begin{array}{c} s_{A}(0) \\ i_{A}(0) \\ r_{A}(0)\end{array}\displaystyle \right ]+ \int _{0}^{t} \beta s_{A}(u)i_{A}(u)du\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ]+\text{ }\int _{0}^{t}\gamma i_{A}(u)du\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ } \frac {1}{A}\tilde{N}_{-1,1,0}\bigg(\int _{0}^{t} A\beta s_{A}(u)i_{A}(u)du \bigg)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{A}\tilde{N}_{0,-1,1}\bigg(\int _{0}^{t} A\gamma i_{A}(u)du \bigg)\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ]. \end{aligned}$$
(8)

By Doob’s martingale inequality, for each finite \(T>0\), \(\mathbb{P}\)-a.s.

$$ \lim \limits _{A\rightarrow \infty}\sup \limits _{s\leq u}\bigg\vert \frac{\tilde{N}_{-1,1,0}(As)}{A} \bigg\vert =\lim \limits _{A \rightarrow \infty}\sup \limits _{s\leq T}\bigg\vert \frac{\tilde{N}_{0,-1,1}(As)}{A} \bigg\vert =0. $$
(9)

This suggests that for large A, the two compensated Poisson processes \(\tilde{N}_{-1,1,0}\) and \(\tilde{N}_{0,-1,1}\) should be negligible in model (8) i.e. the solution of (8) converges to the solution of (4) (for the case \(\Lambda =\Lambda ^{+}-\Lambda ^{-}=\lambda _{i}=\mu _{i}=0\), \(i=1,2,3\)).

6 Diffusion approximation

We only consider diffusion approximation of (5) under the assumption \(\Lambda ^{-}=0\) since with \(\Lambda ^{-}>0\) an analysis with stopping times as for the ODE approximation in the previous section is needed which is beyond the scope of this paper. Also here we study for convenience the case for which also \(\Lambda ^{+}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\). The case for which some of those parameters \(\Lambda ^{+}\), \(\lambda _{i}\), \(\mu _{i}\) are non-zero is also here almost identical.

We scale the compensated Poisson process with respect to the area appropriately,

$$ {W}_{1,0,0}^{A}(t)=\frac {1}{\sqrt{A}}\tilde{N}_{-1,1,0}(At),\qquad {W}_{0,-1,1}^{A}(t)= \frac {1}{\sqrt{A}}\tilde{N}_{0,-1,1}(At). $$

Then

$$\begin{aligned} \left [ \textstyle\begin{array}{c} s_{A}(t) \\ i_{A}(t) \\ r_{A}(t)\end{array}\displaystyle \right ] =&\left [ \textstyle\begin{array}{c} s_{A}(0) \\ i_{A}(0) \\ r_{A}(0)\end{array}\displaystyle \right ]+\int _{0}^{t} \beta s_{A}(u)i_{A}(u)du\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ]+\text{ }\int _{0}^{t} \gamma i_{A}(u)du\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}{W}_{-1,1,0}^{A}\bigg(\int _{0}^{t} \beta s_{A}(u)i_{A}(u)du\bigg)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}{W}_{0,-1,1}^{A}\bigg(\int _{0}^{t} \gamma i_{A}(u)du\bigg)\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ]. \end{aligned}$$

Donsker’s functional central limit theorem says that for large A,

$$ {W}_{-1,1,0}^{A}(t)=\frac {1}{\sqrt{A}}\tilde{N}_{-1,1,0}(At)\approx W_{-1,1,0}(t) $$

and

$$ {W}_{0,-1,1}^{A}(t)=\frac {1}{\sqrt{A}}\tilde{N}_{0,-1,1}(At)\approx W_{0,-1,1}(t), $$

where \(W_{-1,1,0}\) and \(W_{0,-1,1}\) are independent standard one-dimensional Brownian motions, and the approximation is uniform on bounded time intervals. This inspires to a diffusion approximation \(\{\hat{x}_{A}(t)=[\hat{s}_{A}(t)\;\hat{i}_{A}(t)\;\hat{r}_{A}(t)]'\}_{t \geq 0}\)

$$\begin{aligned} \hat{x}_{A}(t) =& \hat{x}_{A}(0)+\int _{0}^{t}\beta \hat{s}_{A}(u){i}_{A}(u)du \left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ]+\int _{0}^{t} \gamma \hat{i}_{A}(u)du\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}{W}_{-1,1,0}\bigg(\int _{0}^{t} \beta \hat{s}_{A}(u){i}_{A}(u)du\bigg)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}{W}_{0,-1,1}\bigg(\int _{0}^{t} \gamma \hat{i}_{A}(u)du\bigg)\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ]. \end{aligned}$$
(10)

Note that existence, uniqueness and convergence as \(A\to \infty \) in (10) are not trivial. Particularly, the existence of a solution is not evident due to the non-standard nature of equation (10).

For existence and uniqueness of a solution of (10), we first consider the following SDE (which will later be translated into (10)):

$$\begin{aligned} \tilde{x}_{A}(t) =& \tilde{x}_{A}(0)+\int _{0}^{t}\beta \tilde{s}_{A}(u) \tilde{i}_{A}(u)du\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] +\int _{0}^{t} \gamma \tilde{i}_{A}(u)du\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\beta \tilde{s}_{A}(u) \tilde{i}_{A}(u)}d{B}_{-1,1,0}(u)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\gamma \tilde{i}_{A}(u)}d{B}_{0,-1,1}(u) \left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] , \end{aligned}$$
(11)

where \(B_{-1,1,0}\) and \(B_{0,-1,1}\) are independent standard one-dimensional Brownian motions.

For given \(M>0\), existence of a solution of (11) is obtained until the first exit time of \((0,M)^{3}\) under which the drift and dispersion are bounded and continuous, see for example [8, Theorem 2.2, p. 155]. For given \(\epsilon \in (0,M)\) uniqueness of a solution of (11) is obtained until the first exit time of \((\epsilon ,M)^{3}\) for which the drift and dispersion are Lipschitz continuous, see for example [8, Theorem 2.2, p. 155].

A solution to (10) can be constructed from (11). Let

$$ \tau _{1}(s)=\inf \bigg\{ t:\int _{0}^{t}\beta \tilde{s}_{A}(u) \tilde{i}_{A}(u)\mathrm{d}u>s\bigg\} \wedge \tau , $$
(12)

and

$$ \tau _{2}(s)=\inf \bigg\{ t:\int _{0}^{t}\gamma \tilde{i}_{A}(u) \mathrm{d}u>s\bigg\} \wedge \tau , $$
(13)

where τ is the first exit time of \((0,M)^{3}\). Let

$$ Z_{1}(t)=\int _{0}^{t} \sqrt{\beta \tilde{s}_{A}(u)\tilde{i}_{A}(u)} \mathrm{d}{B}_{-1,1,0}(u), $$

and

$$ Z_{2}(t)=\int _{0}^{t} \sqrt{\gamma \tilde{i}_{A}(u)}\mathrm{d}{B}_{0,-1,1}(u). $$

Using the optional sampling theorem we obtain that \(\{Z_{1}(\tau _{1}(t))\}_{t\geq 0}\) and \(\{Z_{2}(\tau _{2}(t))\}_{t\geq 0}\) are martingales. Moreover, since

$$ \langle Z_{1}(\tau _{1}(.))\rangle _{t}=\int _{0}^{\tau _{1}(t)} \beta \tilde{s}_{A}(u) +\tilde{i}_{A}(u){d}u=t, $$

and similarly

$$ \langle Z_{2}(\tau _{2}(.))\rangle _{t}=\int _{0}^{\tau _{2}(t)} \gamma \tilde{i}_{A}(u)\mathrm{d}u=t, $$

we conclude that \(\{Z_{1}(\tau _{1}(t))\}_{t\geq 0}\) and \(\{Z_{2}(\tau _{2}(t))\}_{t\geq 0}\) are Brownian motions. The processes \(\{Z_{1}(\tau _{1}(t))\}_{t\geq 0}\) and \(\{Z_{2}(\tau _{2}(t))\}_{t\geq 0}\) are independent which follows from the fact that

$$\begin{aligned} & \mathbb{E}[Z_{1}(\tau _{1}(t))Z_{2}(\tau _{2}(t))] \\ =&\mathbb{E}\bigg[\int _{0}^{\tau _{1}(t)} \sqrt{\beta \tilde{s}_{A}(u) \tilde{i}_{A}(u)}\mathrm{d}{B}_{-1,1,0}(u)\int _{0}^{\tau _{2}(t)} \sqrt{\gamma \tilde{i}_{A}(u)}{d}{B}_{0,-1,1}(u)\bigg]=0. \end{aligned}$$

Let \({W}_{-1,1,0}(t)=Z_{1}(\tau _{1}(t))\) and \({W}_{0,-1,1}(t)=Z_{2}(\tau _{2}(t))\). Then

$$ {W}_{-1,1,0}\bigg(\int _{0}^{t}\beta \tilde{s}_{A}(u)\tilde{i}_{A}(u)du \bigg)=\int _{0}^{t} \sqrt{\beta \tilde{s}_{A}(u)\tilde{i}_{A}(u)} \mathrm{d}{B}_{-1,1,0}(u), $$
(14)

and

$$ {W}_{0,-1,1}\bigg(\int _{0}^{t}\gamma \tilde{i}_{A}(u)du\bigg)=\int _{0}^{t} \sqrt{\gamma \tilde{i}_{A}(u)}\mathrm{d}{B}_{0,-1,1}(u). $$
(15)

Inserting (14) and (15) into (11) we obtain

$$\begin{aligned} \tilde{x}_{A}(t) =& \tilde{x}_{A}(0)+\int _{0}^{t}\beta \tilde{s}_{A}(u){i}_{A}(u)du \left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ]+\int _{0}^{t} \gamma \tilde{i}_{A}(u)du\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &\text{ } +\frac {1}{\sqrt{A}}{W}_{-1,1,0}\bigg(\int _{0}^{t}\beta \tilde{s}_{A}(u)\tilde{i}_{A}(u)du\bigg)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}} {W}_{0,-1,1}\bigg(\int _{0}^{t}\gamma \tilde{i}_{A}(u)du\bigg)\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ], \end{aligned}$$

which implies that \(\tilde{x}_{A}\) is a solution of (10) until the first exit time of \((0,M)^{3}\).

For convergence as \(A\to \infty \), a procedure can be as follows. First a probability space with two Brownian motions \(B_{-1,1,0}(t)\) and \(B_{0,-1,1}(t)\) are given. Then the Brownian motions \(W_{-1,1,0}(t)\) and \(W_{0,-1,1}(t)\) are constructed as above. Then it is possible to extend the probability space so that two independent standard Poisson processes \(N_{-1,1,0}(t)\) and \(N_{0,-1,1}(t)\) can be defined such that for the compensators \(\tilde{N}_{-1,1,0}(t)={N}_{-1,1,0}(t)-t\) and \(\tilde{N}_{0,-1,1}(t)={N}_{0,-1,1}(t)-t\),

$$ \sup \limits _{t\geq 0} \frac{\tilde{N}_{-1,1,0}(t)-W_{-1,1,0}(t)}{\log (2\vee t)} $$

and

$$ \sup \limits _{t\geq 0} \frac{\tilde{N}_{0,-1,1}(t)-W_{0,-1,1}(t)}{\log (2\vee t)}, $$

have finite exponential moment (see e.g. [7, Corollary 5.5, p. 359] and [12, Lemma 3.12] with references emanating from [9]; see also [14, p. 237] for a somewhat different formulation on the difference between constructed compensated Poisson processes and Brownian motions), resulting into, for \(\log (A)>1\) and \(T>0\) and \(x_{A}(0)=\tilde{x}_{A}(0)\), there exists a random variable K such that

$$ \vert x_{A}(t)- \tilde{x}_{A}(t)\vert \leq K \dfrac{\log (A)}{A}\;\; \text{for}\,\,t\leq T\wedge \tau _{A}, $$

where \(\tau _{A}\) is the first exit time of from \((\epsilon ,M)^{3}\) and \(\mathbb{E}[\exp (\lambda K)]<\infty \) for some \(\lambda >0\), see e.g. [12, Lemma 3.7 and Theorem 3.13] and [14, Theorem 1.1 and Theorem 1.2], where in the last reference, the box \((\epsilon ,M)^{3}\) is replaced by a region which is allowed to depend on the scaling parameter which here is the A.

The exit probability of \(\hat{x}_{A}\) from \((\epsilon ,M)^{3}\) in \([0,T]\) is small if A is large. We have for instance:

Theorem 1

Let \(V: \mathbb{R_{+}}^{3}\ni x=(x_{1},x_{2},x_{3})\mapsto V(x)\in \mathbb{R_{+}}\),

$$ V(x)=2(x_{1}-\frac{\gamma}{2\beta}-\frac{\gamma}{2\beta}\ln \frac{x_{1}}{\gamma /(2\beta )})+2(x_{2}-1-\ln x_{2})+x_{3}-1-\ln x_{3}. $$

For a fixed horizon T,

$$ P(\tau _{G,A}< T)\leq [E[V(x(0))]+2\gamma T+\frac {1}{2A}(\gamma \frac {M}{\epsilon}+\beta \frac{M}{\epsilon}+\frac {\gamma}{M}+ \gamma \frac {M}{\epsilon ^{2}})T]/C_{\epsilon}^{M}. $$

where

$$\begin{aligned} C_{\epsilon}^{M} =& [(\epsilon -\frac{\gamma}{2\beta}- \frac{\gamma}{2\beta}\ln \frac{\epsilon}{\gamma /(2\beta )})\wedge 2( \epsilon -1-\ln \epsilon ) \\ &\textit{ }\wedge (M-\frac{\gamma}{2\beta}-\frac{\gamma}{2\beta}\ln \frac{M}{\gamma /(2\beta )})\wedge 2(M-1-\ln M)]. \end{aligned}$$

In particular,

$$ \lim _{\epsilon \downarrow 0, M\uparrow \infty} \lim _{A\to \infty}P( \tau _{G,A}< T)=0. $$

Proof

For \(t<\tau _{G}\), with \(x_{1}(t)={s}_{A}(t)\), \(x_{2}(t)={i}_{A}(t)\), \(x_{3}(t)={r}_{A}(t)\), \(x(t)=(x_{1}(t),x_{2}(t),x_{3}(t))^{tr}\), \(dV=LV+dM \),

$$\begin{aligned} LV =&[2(1-\frac{\gamma}{2\beta}\frac {1}{x_{1}})(-\beta x_{1}x_{2})+2(1- \frac {1}{x_{2}})(\beta x_{1}x_{2}-\gamma x_{2})+(1-\frac {1}{x_{3}}) \gamma x_{2}]dt \\ &+[\text{ }\frac {\gamma}{2\beta}\frac {1}{x_{1}^{2}} \frac{\beta x_{1}x_{2}}{A}+\frac {1}{x_{2}^{2}}( \frac{\beta x_{1}x_{2}}{A}+\frac{\gamma x_{2}}{A})+ \frac {1}{2x_{3}^{2}}\frac{\gamma x_{2}}{A}]dt+dM \\ =&[-2\beta x_{1}+2\gamma -\gamma \frac{x_{2}}{x_{3}}]dt \\ +&\text{ }\frac {1}{2A}[\gamma \frac {x_{2}}{x_{1}}+\beta \frac{x_{1}}{x_{2}}+\frac {\gamma}{x_{2}}+\gamma \frac {x_{2}}{x_{3}^{2}}]dt+dM \\ \leq &2\gamma dt+\frac {1}{2A}[\gamma \frac {x_{2}}{x_{1}}+\beta \frac{x_{1}}{x_{2}}+\frac {\gamma}{x_{2}}+\gamma \frac {x_{2}}{x_{3}^{2}}]dt+dM. \end{aligned}$$

We get

$$\begin{aligned} E[V(x(\tau _{G}\wedge T)]\leq E[V(x(0)]+2\gamma T+\frac {1}{2A}[ \gamma \frac {M}{\epsilon}+\beta \frac{M}{\epsilon}+\frac {\gamma}{M}+ \gamma \frac {M}{\epsilon ^{2}}]T. \end{aligned}$$

We also have

$$\begin{aligned} E[V(x(\tau _{G}\wedge T)] \geq &E[V(x(\tau \wedge )] T)1_{\tau _{G}< T}] \\ \geq & C_{\epsilon}^{M} [(\epsilon -\frac{\gamma}{2\beta}- \frac{\gamma}{2\beta}\ln \frac{\epsilon}{\gamma /(2\beta )})\wedge 2( \epsilon -1-\ln \epsilon ) \\ &\text{ }\wedge (M-\frac{\gamma}{2\beta}-\frac{\gamma}{2\beta}\ln \frac{M}{\gamma /(2\beta )})\wedge 2(M-1-\ln M)]P(\tau _{G}< T) \\ =&C_{\epsilon}^{M}P(\tau _{G}< T). \end{aligned}$$

Hence

$$ P(\tau _{G}< T)\leq [E[V(x(0)]+2\gamma T+\frac {1}{2A}(\gamma \frac {M}{\epsilon}+\beta \frac{M}{\epsilon}+\frac {\gamma}{M}+ \gamma \frac {M}{\epsilon ^{2}})T]/C_{\epsilon}^{M}. $$

The diffusion approximation of (5) allowing also \(\Lambda ^{-}\) to be positive is

$$\begin{aligned} \tilde{x}_{A}(t) =& \tilde{x}_{A}(0)+\int _{0}^{t}\beta \tilde{s}_{A}(u) \tilde{i}_{A}(u)du\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] +\int _{0}^{t} \gamma \tilde{i}_{A}(u)du\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\int _{0}^{t} \Lambda ^{+}-\Lambda ^{-}+(\lambda _{1}-\mu _{1}) \tilde{s}_{A}(u)du\left [ \textstyle\begin{array}{r} 1 \\ 0 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\int _{0}^{t} (\lambda _{2}-\mu _{2})\tilde{i}_{A}(u)du \left [ \textstyle\begin{array}{r} 0 \\ 1 \\ 0\end{array}\displaystyle \right ]+\int _{0}^{t} (\lambda _{3}-\mu _{3})\tilde{r}_{A}(u)du \left [ \textstyle\begin{array}{r} 0 \\ 0 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\beta \tilde{s}_{A}(u) \tilde{i}_{A}(u)}d{B}_{-1,1,0}(u)\left [ \textstyle\begin{array}{r} -1 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\gamma \tilde{i}_{A}(u)}d{B}_{0,-1,1}(u)\left [ \textstyle\begin{array}{r} 0 \\ -1 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\bigg(\int _{0}^{t} \sqrt{\Lambda ^{+}}d{B}_{1,0,0}^{+,mig}(u)- \int _{0}^{t} \sqrt{\Lambda ^{-}}d{B}_{1,0,0}^{-,mig}(u)du\bigg) \left [ \textstyle\begin{array}{r} 1 \\ 0 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\lambda _{1} \tilde{s}_{A}(u)}dB_{1,0,0}^{+}\left [ \textstyle\begin{array}{r} 0 \\ 0 \\ 1\end{array}\displaystyle \right ] -\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\mu _{1} \tilde{s}_{A}(u)}dB_{1,0,0}^{-}\left [ \textstyle\begin{array}{r} 0 \\ 0 \\ 1\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\lambda _{2} \tilde{i}_{A}(u)}dB_{1,0,0}^{+}\left [ \textstyle\begin{array}{r} 0 \\ 1 \\ 0\end{array}\displaystyle \right ] -\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\mu _{2} \tilde{i}_{A}(u)}dB_{0,1,0}^{-}\left [ \textstyle\begin{array}{r} 0 \\ 1 \\ 0\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\lambda _{3} \tilde{r}_{A}(u)}dB_{0,0,1}^{+}\left [ \textstyle\begin{array}{r} 0 \\ 0 \\ 1\end{array}\displaystyle \right ] -\text{ }\frac {1}{\sqrt{A}}\int _{0}^{t} \sqrt{\mu _{3} \tilde{r}_{A}(u)}dB_{0,0,1}^{-}\left [ \textstyle\begin{array}{r} 0 \\ 0 \\ 1\end{array}\displaystyle \right ], \end{aligned}$$
(16)

where \(B_{-1,1,0},\ldots B_{0,0,1}^{-}\) are independent standard one-dimensional Brownian motions.

Remark 1

(Generator argument of ODE- and SDE approximations)

Heuristic generator arguments why ODE-approximation and SDE-approximation can be used for the Poissonian SIR model (5) can be as follows. We consider for simplicity the case for which \(\Gamma ^{+}=\Gamma ^{-}=\gamma _{i}=\mu _{i}\) for \(i=1,\ldots ,3\). The critical case \(\Gamma ^{-}>0\), not taken care of here, needs a finer reasoning. In terms of transition probabilities we have for the case \(\Gamma ^{+}=\Gamma ^{-}=\gamma _{i}=\mu _{i}\) for \(i=1,\ldots ,3\):

$$\begin{aligned}& \mathbb{P}\left (\left [ \textstyle\begin{array}{c} s_{A}(t+h) \\ i_{A}(t+h) \\ r_{A}(t+h)\end{array}\displaystyle \right ]=\left [ \textstyle\begin{array}{c} s_{A}(t) \\ i_{A}(t) \\ r_{A}(t)\end{array}\displaystyle \right ]+\left [ \textstyle\begin{array}{r} -1/A \\ 1/A \\ 0\end{array}\displaystyle \right ]\right )=A\beta s_{A}(t)i_{A}(t) h+o(h),\\& \mathbb{P}\left (\left [ \textstyle\begin{array}{c} s_{A}(t+h) \\ i_{A}t+h) \\ r_{A}(t+h)\end{array}\displaystyle \right ]=\left [ \textstyle\begin{array}{c} s_{A}(t) \\ i_{A}(t) \\ r_{A}(t)\end{array}\displaystyle \right ]+\left [ \textstyle\begin{array}{r} 0 \\ -1/A \\ 1/A\end{array}\displaystyle \right ]\right )= \gamma {Ai_{A}(t)}h+o(h),\\& \mathbb{P}\left (\left [ \textstyle\begin{array}{c} s_{A}(t+h) \\ i_{A}(t+h) \\ r_{A}(t+h)\end{array}\displaystyle \right ]=\left [ \textstyle\begin{array}{c} s_{A}(t) \\ i_{A}(t) \\ r_{A}(t)\end{array}\displaystyle \right ]\right )=1-A(\beta s_{A}(t) i_{A}(t) +\gamma i_{A}(t))h+o(h). \end{aligned}$$

For \(f\in C^{2}(\mathbb{R}^{3};\mathbb{R})\), ∇f its gradient and H its Hessian we then get for \(x_{1},x_{2},x_{3} \in \{0, 1/A,2/A,\ldots \}^{3}\),

$$\begin{aligned} G_{A}f\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right ) =&\lim _{h\downarrow 0} \frac {1}{h}\mathbb{E}\left [f \left (\left [ \textstyle\begin{array}{r} {s_{A}(h)} \\ {i_{A}(h)} \\ {r_{A}(h)}\end{array}\displaystyle \right ]\right )-f\left (\left [ \textstyle\begin{array}{r} {s_{A}(0)} \\ {i_{A}(0)} \\ {r_{A}(0)}\end{array}\displaystyle \right ]\right )\Bigg\vert \left [ \textstyle\begin{array}{r} {s_{A}(0)} \\ {i_{A}(0)} \\ {r_{A}(0)}\end{array}\displaystyle \right ]=\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right ] \\ =&A\beta x_{1}x_{2}\left (f\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]+\left [ \textstyle\begin{array}{r} {-1}/A \\ {1}/A \\ {0}/A\end{array}\displaystyle \right ]\right )-f\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right )\right ) \\ &+\text{ }A\gamma x_{2}\left (f\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]+\left [ \textstyle\begin{array}{r} {0}/A \\ {-1}/A \\ {1}/A\end{array}\displaystyle \right ]\right )-f\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right )\right ) \\ =&A\beta x_{1}x_{2}\nabla f\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right )^{tr}\left [ \textstyle\begin{array}{r} {-1}/A \\ {1}/A \\ {0}/A\end{array}\displaystyle \right ]+\text{ }A\gamma x_{2}\nabla f\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right )^{tr}\left [ \textstyle\begin{array}{r} {0}/A \\ {-1}/A \\ {1}/A\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{2}A\beta x_{1}x_{2}\left [ \textstyle\begin{array}{r} {-1}/A \\ {1}/A \\ {0}/A\end{array}\displaystyle \right ]^{tr}H\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right )\left [ \textstyle\begin{array}{r} {-1}/A \\ {1}/A \\ {0}/A\end{array}\displaystyle \right ] \\ &+\text{ }\frac {1}{2} A\gamma x_{1}\left [ \textstyle\begin{array}{r} {0}/A \\ {-1}/A \\ {1}/A\end{array}\displaystyle \right ]^{tr}H\left (\left [ \textstyle\begin{array}{r} x_{1} \\ x_{2} \\ x_{3}\end{array}\displaystyle \right ]\right )\left [ \textstyle\begin{array}{r} {0}/A \\ {-1}/A \\ {1}/A\end{array}\displaystyle \right ] +o(1/A). \end{aligned}$$

If we neglect terms of order \(1/A\) or higher then at the grid points, it is the generator of (1). If we neglect the \(o(1/A)\) remainder term, then, at the grid points, it is the generator of (11) up to the first hitting time τ of the axes: \(\tau =\inf \{t>0: s(t)i(t)r(t)=0\}\).

Remark 2

(Stochastically perturbed SIR modeling: parameter perturbation)

There is a rich flora of stochastic models that are based on deterministic ODE models for which some parameters are subjected to noise, we mention among others, [1, 3, 4, 6].

For instance consider the SIR ODE (1). If say β is not precisely not known and for simplicity is randomly fluctuating around a constant \(\beta _{0}\), for example β is replaced by \(\beta _{0} +\sigma \frac{dW}{dt}\) where \(\frac{dW}{dt}\) can be interpreted as a Gaussian white noise i.e. \(\beta dt\) is replaced by \(\beta _{0} dt +\sigma dW\), where W is a Brownian motion, we then get the SDE

$$ d\left [ \textstyle\begin{array}{c} s(t) \\ i(t) \\ r(t)\end{array}\displaystyle \right ] =\left [ \textstyle\begin{array}{c}-\beta _{0}s(t)i(t) \\\beta _{0} s(t) i(t)-\gamma i(t) \\\gamma i(t)\end{array}\displaystyle \right ]dt+ \sigma s(t)i(t)\left [ \textstyle\begin{array}{r}-1 \\1 \\0\end{array}\displaystyle \right ]dW, $$
(17)

In (17) there are no square root dispersion terms as opposed to (11). Starting in \((0,\infty )^{3}\), the solution of (17) remains in \((0,\infty )^{3}\), see for instance [1, 3, 4, 6], while for (11), the boundary of \((0,\infty )^{3}\) may be hit.

For the SDE (17), the uncertainty is added to the ODE (1). For the SDE (11) the uncertainty is instead built into the model taking into account that spread of diseases is in its nature uncertain.

7 Simulations

Simulations of the continuous-time Markov chain (5), the SDE (16), and the ODE (4) for some different values of \(\Lambda ^{+}\), \(\Lambda ^{-}\), \(\lambda _{i}\), and \(\mu _{i}\) for \(i=1,2,3\), are here presented. The simulations of (5) are exact in the sense that there are no numerical errors. For the SDE (16) and the ODE (4), Euler (Maruyama) approximations are applied.

For the case \(\Lambda ^{-}>0\), the simulations are stopped just before the susceptible part is negative wherafter (5), (16), and (4) are not valid anymore.

For the case \(\Lambda ^{-}=0\), non-negativity for (5) and (4) do not occur while for (16) a solution is defined until the first hitting time of the axes. The behaviour of the solution of (16) near the boundary of \([0,\infty )^{3}\) is not evident. In the simulations it was found reasonable to let the numerical solution continue after hitting the axes by applying the \(x\mapsto \max (x,0)\) function at each Euler Maruyama step, i.e. a projected version of the Euler Maruyama scheme is applied. It means that the numerical solution solves a constrained version of the SDE (16) inwards reflecting at the boundary of \([0,\infty )^{3}\). It is then not obvious whether the reflected version of (16) need additional ‘push-terms’ (described by the local times at the boundary) or is instantaneously reflecting without push-terms. An alternative simulation could, for the case when also \(\Lambda ^{+}=0\), to let \(\{0\}\) be absorbing for both the susceptible and the infective compartments while allowing \(\{0\}\) to be reflecting for the recovered compartment allowing the dynamics to evolve even if initial value for the recovery compartment is zero. A third alternative could be to allow the numerical simulation to take negative values and replacing the square roots by square roots of absolute values. The SDE (16) is after all not a model but an approximation of (5).

In all the simulations (see Fig. 1, 2, 3, 4, 5, and 6), \(\beta =5\), \(\gamma =2\), \(A=10\) and the initial numbers of individuals in the three compartments in that region are 40, 10, and zero, respectively, i.e., the initial densities for the three compartments are \(40/10=4\), \(10/10=1\), and zero, respectively.

Figure 1
figure 1

One simulation of spatial scaling of the continuous-time Markov chain (5), its diffusion approximation (16) together with the ODE approximation (4) respectively, \(\Lambda ^{+}=\Lambda ^{-}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\)

Figure 2
figure 2

Two other simulations of spatial scaling of the continuous-time Markov chain (5), two simulations of its diffusion approximation (16) together with the ODE approximation (4) respectively, \(\Lambda ^{+}=\Lambda ^{-}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\)

Figure 3
figure 3

One simulation of spatial scaling of the continuous-time Markov chain (5), a simulation of its diffusion approximation (16) together with the ODE approximation (4) respectively, \(\mu _{1}=4\), \(\Lambda ^{+}=\Lambda ^{-}=\lambda _{1}=\mu _{2}=\lambda _{2}=\mu _{3}= \lambda _{3}=0\)

Figure 4
figure 4

One simulation of spatial scaling of the continuous-time Markov chain (5), a simulation of its diffusion approximation (16) together with the ODE approximation (4) respectively, \(\Lambda ^{-}=2\), \(\Lambda ^{+}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\). They are all stopped just before the susceptible area densities are negative, for the continuous time Markov chain at the exit time \(\tau _{A}\) defined by (6)

Figure 5
figure 5

50 simulations of spatial scaling of the continuous-time Markov chain (5) indicating the probability density at each time point for three cases: (a) \(\Lambda ^{+}=\Lambda ^{-}= \lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\); (b) \(\mu _{1}=4\), \(\Lambda ^{+}=\Lambda ^{-}=\lambda _{1}=\mu _{2}=\lambda _{2}=\mu _{3}= \lambda _{3}=0\); (c) \(\Lambda ^{-}=2\), \(\Lambda ^{+}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\). The scaling of the axes in the figures are the same. Figures (a) and (b) indicate that deaths of susceptible decrease the area density for each compartment. Figure (c) which is a case with emigration, indicate the probability distribution of the exit time \(\tau _{A}\) defined by (6). Figure (c) is to be compared with (b) where the Markov chain never exits even though deaths are allowed

Figure 6
figure 6

50 Euler-Maruyama approximative simulations of spatial scaling of the SDE approximation (16) together with the ODE approximation (4) respectively, indicating the probability density at each time point for three cases: (a) \(\Lambda ^{+}=\Lambda ^{-}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\); (b) \(\mu _{1}=4\), \(\Lambda ^{+}=\Lambda ^{-}=\lambda _{1}=\mu _{2}=\lambda _{2}=\mu _{3}= \lambda _{3}=0\); (c) \(\Lambda ^{-}=2\), \(\Lambda ^{+}=\lambda _{i}=\mu _{i}=0\) for \(i=1,2,3\). The scaling of the axes in the figures are also the same. The figures here are similar to those in Fig. 5. In (c), the simulations are stopped just before the susceptible part is negative. The probability density of that exit time is also here indicated. The infective component and the recovery component are not stopped for the Euler-Maruyama approximations; instead they are conveniently projected in order for them to be non-zero similar to the continuous-time Markov chain; the dynamics stops first at the exit time for the density of susceptibles

8 Conclusion

To conclude, in this study we acknowledge that the population size changes over time due to factors like births, deaths or migration. This means that using population size as a scaling parameter, as follows directly from previous studies [1114], is not suitable. Instead, we propose a different approach: we model the number of individuals in each compartment in a region of a specific area using a birth-death type continuous time Markov chain. In this setup, the area size becomes the scaling parameter, and the resulting scaled Markov chain shows the density of individuals in each compartment.

This approach allows us to approximate the scaled time-continuous Markov chains using classical ordinary differential equations for large regions, and approximating stochastic differential equations for intermediate size regions. We underline that for the case with constant total population size, the variables in a classical epidemiological ordinary differential equation can describe fractions of individuals from each compartment, while here, allowing non-constant population size due to migration or birth and deaths, the classical ordinary differential equations describe the areal density of individuals in each compartment. The transmission parameter will then have an interpretation related to densities of individuals.

In summary, our research provides a detailed analysis of epidemic modeling techniques, focusing on understanding compartmental quantities like susceptible s, infected i, and recovered r. We introduce a new density-dependent approach inspired by Kurtz’s from 1970s’ to clarify the significance of key parameters. Unlike traditional methods that rely on constant population size scaling, our approach uses area-based scaling, enabling us to adapt to fluctuations in population size. Through this method, we aim to provide a clearer and more transparent illustration of compartment quantities and parameters in epidemic modeling.

Potential future directions involve investigation of the distributions of the first hitting times for the approximate diffusion process on the axes as well as investigation of continuations of solutions after the first hitting times and, for the case with emigration, the first hitting time of the axes for the continuous-time Markov chain.