Analysing the Effect of Test-and-Trace Strategy in an SIR Epidemic Model

Consider a Markovian SIR epidemic model in a homogeneous community. To this model we add a rate at which individuals are tested, and once an infectious individual tests positive it is isolated and each of their contacts are traced and tested independently with some fixed probability. If such a traced individual tests positive it is isolated, and the contact tracing is iterated. This model is analysed using large population approximations, both for the early stage of the epidemic when the “to-be-traced components” of the epidemic behaves like a branching process, and for the main stage of the epidemic where the process of to-be-traced components converges to a deterministic process defined by a system of differential equations. These approximations are used to quantify the effect of testing and of contact tracing on the effective reproduction numbers (for the components as well as for the individuals), the probability of a major outbreak, and the final fraction getting infected. Using numerical illustrations when rates of infection and natural recovery are fixed, it is shown that Test-and-Trace strategy is effective in reducing the reproduction number. Surprisingly, the reproduction number for the branching process of components is not monotonically decreasing in the tracing probability, but the individual reproduction number is conjectured to be monotonic as expected. Further, in the situation where individuals also self-report for testing, the tracing probability is more influential than the screening rate (measured by the fraction infected being screened).


Introduction
An important reason for modelling the spread of infectious diseases lies in better understanding the effect of various preventive measures, such as lockdown, social distancing, contact tracing, testing, self-isolating and quarantining. During the ongoing pandemic of Covid-19 the Test-and-Trace (TT) strategy has received a lot of attention Lucas et al. 2020;Bradshaw et al. 2021). The test-part of the strategy means that testing (of suspected cases and/or randomly chosen individuals) is increased, with the hope that finding infected individuals quickly and isolating them will reduce the transmission. The trace-part of the strategy is that individuals who are tested positive are quickly questioned about their recent contacts, and such contacts are then localized, tested and isolated if testing positive.
Contact tracing and its effect has been studied both from a theoretical perspective, and during the ongoing Covid-19 pandemic also from an applied point of view, including procedures for estimation of model parameters. For Covid-19 it has been observed in several countries that contact tracing is a highly powerful intervention measure. For instance, in the UK study ) of the TT-program first carried out on the Isle of Wight, they concluded that the number of new confirmed Covid-19 cases decreased more sharply after the TT-intervention. In (Lucas et al. 2020), they found that it is unlikely for strict self-isolation policies to improve the effectiveness by means of contact tracing. In particular, the effect of contact tracing on controlling the Covid-19 epidemic has been studied based on simulation models in several papers (e.g. Di Domenico et al. 2020;Firth et al. 2020;Keeling et al. 2020;Kretzschmar et al. 2020). One benefit of simulation models is that the results are easier to interpret, whereas one shortcoming is that they are difficult to be analysed analytically. Our paper is concerned with rigorous large population approximations for a stochastic epidemic model using theory for branching processes.
More theoretical studies often make simplifying assumptions in order to make more analytical progress. For example, under the assumption of a homogeneous mixing population, Ball et al. (2011Ball et al. ( , 2015 consider the traditional SIR model with forward tracing (where either a fraction of the infectees of a parent case is tested or none of the contacts is reported) but without tracing backwards to infectors of tested individuals. The model in Bradshaw et al. 2021 suggests that both backward and forward tracing would remarkably increase the effectiveness of Covid-19 epidemic control. Müller et al. (2000) deals with a stochastic SIRS model among a homogeneous mixing population. Once infectious individuals are discovered, each of their possible infectious contacts will be traced and treated (equivalent to isolated) with some probability. It is analysed (with focus on the age since infection) gradually in three cases, namely backward tracing, forward tracing and tracing both ways. They derive the critical tracing probability for reducing the effective reproduction number to below 1 so that a major outbreaks no longer may occur. Ball et al. (2011) is concerned with a SIR epidemic model in a homogeneous mixing population. Diagnosed individuals in Ball et al. (2011) are asked to name a fraction of their infectees, who will be isolated (i.e. removed) immediately if they have not been diagnosed earlier. Further, those traced individuals are asked to name their infectious contacts in the same way, otherwise none of their contacts will be named. The model studied in Ball et al. (2015) extends the one in Ball et al. (2011) by introducing the exposed period and tracing delays, where the infectious individuals who share the same infector are traced after independent delay times. It is also assumed that untraced individuals may not be asked to name their infectees, for instance when they are asymptomatic. Numerical results in Ball et al. (2015) indicate that independent delay times have bigger effect on the spreading compared to the situation where the delay times from one infected individual being contact traced are all the same. Recently, Müller and Hösel (2021) adds super-spreader events, where several individuals may get infected by one infector at the same time, to a model having contact tracing similar to Müller et al. (2000).
Beside forward and backward tracing, Mancastroppa et al. (2021) suggests that there is a "sideward" tracing when tracing large gatherings (where infected asymptomatic individuals could be traced even if they are neither the infectees nor the infector of the index case). Additionally, Barlow (2020) analyse the epidemic model as a branching process with contact tracing on top of its genealogical tree. According to their assumption, each infective will be detected with some probability after a certain number of generations and then each of its contacts will be successfully traced with some probability. As in our paper, they focus on the evolution of "traceable clusters" (in our paper called "to-be-reported components"), but from a different point of view: they extend the percolation-based analysis to contact tracing and give the approximated expression of the probability of extinction.
In the present paper we consider an SIR model with TT strategy in which not all individuals necessarily are traced, tracing is both backward and forward, but on the other hand assuming no latent periods and no delay before contact tracing happens. To conclude, we study a stochastic SIR epidemic model including Test-and-Trace prevention for a large finite population. The Test-feature is modelled by assuming that infectious individuals are tested (screened) at a constant rate, and for tracing we assume that an individual who tests positive reports each contact independently and reported contacts are traced and tested without delay. In the tracing procedure we assume that currently infectious as well as those who by now have recovered are identified, and that the tracing procedure is iterated for both categories.
More precisely, we analyse a homogeneous SIR epidemic model having four parameters: the rate of infectious contacts β, the recovery rate γ , the testing rate δ for infectious individuals, and the fraction p of contacts that are reached in the contact tracing procedure. Using large population approximations, we analyse both the initial phase of the epidemic (where it behaves like a certain branching process) and the main phase of the epidemic when it can be approximated by a deterministic process. The main focus of the paper is to shed light on how much is gained from the TT-strategy. For example, how should resources optimally be distributed between testing and contact tracing, how much would the reproduction number be reduced for achievable levels in the TT-strategy, and so on.
In Sect. 2 we present the model details and our main results and some intuitions for how the results are obtained. In Sects. 3 and 4 we give more details and proofs to the analyses of the initial stage of the epidemic and its main phase in Sect. 5. In Sect. 6 we report simulations and numerical studies of the model and study how effective the TT-strategy is. The paper ends with a conclusion where extensions and possible improvements are discussed.

The Standard SIR Model
We start with a Markovian SIR (Susceptible → Infectious → Recovered) epidemic spreading in a closed and homogeneous mixing population. By closed, we mean that there is no influx of new susceptibles or death. At any time point, each individual is either susceptible, infectious or recovered. We assume that the size of population is n, and that initially one individual is infectious individual and the rest are susceptible. Each susceptible becomes infectious once he/she makes contact with an infective. Times at which such contacts between two given individuals occurs are constructed by a homogeneous Poisson process with rate β/n. Equivalently, an infectious individual has contacts at rate β, each time with a uniformly selected individual each thus having probability 1/n. Only contacts with susceptibles result in infection, whereas other contacts have no effect. Once an individual gets infected, he/she remains infectious for a random time T I . We assume that the period T I is independent, exponentially distributed with mean E[T I ] = 1/γ , and the parameter γ denotes the rate of natural recovery. So the underlying model is Markovian. Once naturally recovered, the individual plays no role in the spreading of epidemic. The epidemic stops when there are no infectives.

The Markovian SIR-TT Model
Now we incorporate our Test-and-Trace scheme into this SIR model. It is additionally assumed that infectious individual are tested at rate δ (possibly also non-infectious individuals are tested at this rate but this has no effect and is hence not assumed). Individuals that test positive are called diagnosed and diagnosed individuals are immediately isolated thus not taking further part in disease spreading. So, infectious individuals can stop spreading disease either from natural recovery (rate γ ) or from being tested and diagnosed (rate δ). Individuals that are diagnosed are also contact traced. This is modelled by assuming that a diagnosed individual reports each of its infectious contacts (both the infector and infectees) independently with probability p. The individuals that are traced in this way are tested, and individuals that test positive (either still being infectious or by then having recovered) are then contact traced in the same way (so contact tracing is iterated among those that have been infected). To simplify modelling we assume no delay in this contact tracing and instead assume that it all happens instantaneously.
The SIR-TT model makes two simplifying assumptions, that contact tracing occurs without delay, and that also traced individuals who have by now recovered are contact traced. In reality tracing certainly takes some time, and individuals who have recovered several days or even weeks earlier would typically not be contact traced. The results from the present model can hence serve as an upper bound on how effective "real" contact tracing may be. All contact and reporting processes as well as infectious periods are defined mutually independent. In Table 1 we list all the model parameters. It is possible to consider different models for how different individuals would report in relation to each other. For instance, would an infector A report an infectee B independent of whether the infectee B would report the infector A? A "yes" to the answer could for example happen if what defines a "contact" is not clarified enough so what A considers as a contact may not coincide what B thinks, and a "no" could happen if some of the contacts are with friends/acquaintances and other contacts are between unknown people, e.g. on the bus. However, it is clear from the model description that once A or B are diagnosed and asked to report their contacts the reporting event in the opposite direction is useless. This is true also if the first reporting event resulted in not naming the other individual: a later contact tracing of that individual has no effect on the first individual since he/she has already been diagnosed. As a consequence, all models for how contacts report each other (independently, symmetrically or some partial dependence) will result in exactly the same stochastic model. In our description below we have chosen to use the symmetric description thus assuming that A reports B if and only if B reports A, but this is only for practical purposes.
Our model assumes that infectious individuals either recover naturally (at rate γ ) or are tested and diagnosed at rate δ. There is an alternative model interpretation, which is more detailed in the sense that testing could take place also prior to screening. In addition to those who are found by screening, some infectious individuals may test themselves, e.g. due to symptoms. This scenario also fits into the present model by simply adding one more parameter. The parameters γ and δ are unchanged: γ is the rate of natural recovery and δ denotes the testing rate (screening). But now we add a rate ν at which infectious individuals self-report and test themselves. Both self-reporting and screening trigger contact tracing, so all that matters for the epidemic spreading is the sum ν + δ of these two rates. As a consequence, this new model interpretation with 5 parameters (β, γ, δ, ν, p) is identical to the original model with the following 4 parameter values (β, γ, δ + ν, p). Since the alternative model interpretation fits into the original model all mathematical results from the original model apply. In Sect. 6 we give some numerical results also for the alternative model.

Main Results
We start by considering the beginning of the epidemic where we prove that the epidemic, asymptotically as the population size grows to infinity, converges to a certain limit process.
We assume that there is one alive ancestor at time zero. Each alive individual gives birth at rate β, dies naturally with rate γ and is removed from the population with rate δ (corresponding to being tested and diagnosed in the epidemic). Further, an individual who is removed also leads to that each of its offspring as well as its parent will be removed independently with probability p. All those that are removed in this step will in turn lead to that its parent and offspring will be removed independently with probability p and so on. This limit process is identical to the SIR-TT epidemic process defined in Sect. 2.2 (denoted by E n (β, γ, δ, p)) with one single exception. In the epidemic model an infectious individual infects new individuals at rate β(S n (t)/n) where S n (t) denotes the number of susceptibles at t, since only contacts with susceptibles result in infection and this has probability S n (t)/n. On the other hand, in the limit process alive individuals give birth at constant rate β. Nevertheless, in the beginning and assuming a large population then these two rates will be close to each other since then S n (t) ≈ n.
The contact tracing mechanism induces a dependence between individuals both in the epidemic as well as the limiting process. Rather than studying individuals we therefore analyse the process of to-be-reported components (of individuals). More precisely, a new infection/birth is immediately decided if the involved individuals would report the other (with probability p) or not. If it will, then the new individual belongs to the same component but if it will not, the newly infected/born will create a new to-be-reported component. The reason for studying this more complicated description of the same process is that the to-be-reported components of the limit process behave completely independent thus making it a branching process. It is hence possible to use theory for branching process to determine if the process is sub-or super-critical and derive the probability for extinction/minor outbreak. We are now ready for our first main result.
When proving Theorem 1 (Sect. 3) we use coupling methods (Andersson and Britton 2000;Ball and Donnelly 1995) to show that during any finite time period, the epidemic described in terms of to-be-reported components converges to the branching process of to-be-reported components. Having done this it remains to derive properties of such a limiting to-be-reported component. It turns out that such a to-be-reported component can be described by a jump Markov chain having births (increased by 1), deaths (decreased by one) and killing (the whole component being removed), all occurring with linear rates. Suppose that there are currently k alive individuals in the component, then each such individual gives birth to a new to-be-reported individual at rate β p and thus the total birth rate is kβ p. Each individual dies naturally at rate γ so the overall death rate equals kγ . Finally, the whole component is removed as soon as one of the k alive individuals is removed, so this happens at rate kδ. Until the component is removed, it generates index cases to new independent to-be-reported components at rate kβ(1 − p). This describes the evolution of the to-be-reported components. Viewed as a branching process, the most interesting quantity is the distribution of the number of offspring Z (= roots of new to-be-reported components) that one to-bereported component produces before being removed. The mean offspring distribution, corresponding to the reproduction number of the components in the epidemic setting, is then given by By considering the jump Markov chain we can write the total offspring Z as a sum where N C denotes the number of jumps the Markov process makes until it is removed, and X i denotes the number of newly generated roots of components between the (i − 1)-th and i-th jump. Because all three jumps the process can make (birth, death and removal) happen at linear rates, the current number of alive individuals only affects the speed of the process but not which jump it makes. As a direct consequence, the components X 1 , X 2 , . . . are not only independent but also identically distributed: In Sect. 4 we show that and where The reproduction number defined above was for the to-be-reported components (the average number of new components it produces before being removed, i.e. completely diagnosed or die out undetected). Even though the original limit process is not a branching process, it is possible to determine the effective reproduction number R (ind) * for it. Its interpretation is easier: it equals the average number of individuals a typical infected infects during the early stage of the epidemic.
In Sect. 4 we derive the following relation between the two reproduction numbers. with the expected number of born individuals before the component is removed.
Remark 1 It is easily observed that R (ind) * < 1 if and only if R (c) * < 1, and similarly for "=" and ">". The limit process is hence sub-critical (i.e. will die out with probability 1), when R (c) * < 1 and super-critical if R (c) * > 1 (so will grow beyond all limits with positive probability). The same holds true if R . This indicates the following corollary.
Remark 2 In Sect. 6 the two reproduction numbers are computed numerically for different parameter values. Surprisingly, the component reproduction number R (c) * turns out not to be monotonically decreasing in the tracing probability p. However, the individual reproduction number seems to be decaying in p as expected. We have failed in producing a formal proof of this result.
Corollary 1 Let Z n denote the final number, andZ n = Z n /n the final fraction, that get infected during the entire epidemic. If R (ind) * ≤ 1 it then follows thatZ n p → 0 namely there will be a minor outbreak for sure. If R where ρ Z , ρ N c and ρ X are the probability generating functions of Z , N C and X , respectively.
In the last part of Sect. 4, we give special attention to the case where there is no natural recovery (γ = 0) which accordingly can be called the SI-TT model. In this situation, the expressions become simpler and are given in the following corollary.

Corollary 2
In the SI-TT model having γ = 0, the component reproduction number is given by the individual reproduction number equals and the minor outbreak probability becomes Remark 3 Again in this case, we see from Eqs. (10) and (11) that R (c) * ,S I −T T is smaller than or equal to or larger than 1, if and only if R (ind) * ,S I −T T is smaller than or equal to or larger than 1, respectively.
We now switch attention to the main phase of the epidemic rather than its beginning (the corresponding proofs are given in Sect. 5). In order to surpass the initial phase of the epidemic we therefore assume a small initial fraction ε > 0 of infectives (instead of only one initial infective). Further, we assume that contact tracing only takes place for the contacts resulting in infection, not for the contacts between infectious individuals and individuals who have been infected.
We start by introducing notations for the epidemic and its limiting process, where we keep track of the fraction of susceptibles as well as the fractions of infectives belonging to to-be-reported components with each given number of infectives.

the number of individuals who stop being infectious including both naturally recovered and diagnosed, with initial value
n (t)/n} be the stochastic epidemic density process which becomes infinite-dimensional, as the population size n goes to infinity.
The limiting deterministic process denoted by E ∞ = {E ∞ (t); t ≥ 0} = {s(t), i 1 (t), i 2 (t), · · · } is obtained by considering the jumps that the components make. An infection in a j-component moves the component to a ( j +1)-component implying that S is reduced by 1, I j reduced by j and I j+1 increased by j + 1. A natural recovery in such a component increases R by 1, decreases I j by j and increases I j−1 by j − 1. Finally, a test-detection in such a component reduces I j by j and increases R by j.
In Sect. 5, we prove the following theorem.

Theorem 2 For t ≥ 0, let s(t) be the community fraction of susceptibles, i j (t) be the fraction of infectious individuals belonging to a to-be-reported component containing j infectives, and i(t)
, be the community fraction of infectives. Further, we set for j ≥ 2, with the corresponding initial configuration Then the infinite-dimensional stochastic epidemic process E (n) converges to the deterministic process E ∞ defined by Eqs. (13)-(18) as n → ∞, on any finite time interval [0, t end ].
When proving the theorem above, we first truncate both systems such that there is a maximal component size K making the processes finite dimensional, for which theory of population processes gives convergence. Then we argue that the component sizes for the original processes will be exponentially small in maximal component size, thus making the truncated models good approximations of the original processes.
If the SIR-TT model was started with one initial infective the time it takes until a fraction ε, i.e. a number nε, have been infected, tends to infinity. For this reason this initial condition does not converge to the deterministic process above. Similarly, the end of the epidemic where the final small fraction gets infected also takes longer and longer time the larger n is. However, like in many similar but simpler epidemic models we expect that, when it comes to the final number getting infected, the start of the epidemic determines if there is a major outbreak or not, and end of the epidemic has negligible effect. We formulate this more precisely in the following conjecture.
Conjecture 1 Consider the SIR-TT epidemic starting with one initially infective. The final fraction infectedZ n converges to a two point distribution ζ, where ζ = 0 (minor outbreak) happens with probability π , and with probability 1 − π , ζ = r ∞ = lim ε→0 lim t→∞ r (t) (major outbreak), where π is defined in Corollary 1 and Remark 4 In Sect. 6, we show several simulations in support of Conjecture 1 and also indicating that the distribution ofZ n appears to satisfy a central limit theorem concentrated around the deterministic limit r ∞ .
Finally, in Sect. 6 we perform simulations and numerical illustrations confirming our results and investigating the effect of Test-and-Trace strategy for parameter values inspired from the Covid-19 pandemic.

Proof of Theorem 1
In this section, we aim to approximate the early stages of the epidemic using large population approximations. We first denote the sequence of our epidemic processes with one initial infective by {E n (β, γ, δ, p), n ≥ 1}, where we recall that β is rate of infection, the rate of natural recovery is γ , δ denotes the testing (diagnosis) rate and the probability of a contact being reported equals p.
Then we describe the limiting process denoted by E (β, γ, δ, p). At time t = 0, there is only one initial ancestor. Each individual gives birth at rate β during their lifetimes, dies naturally (naturally recovered) with rate γ and is removed (diagnosed) with rate δ. Once removed, each of its descendants and its parent is said to be reported and immediately removed independently with probability p. Meanwhile, every parent and offspring of those who are removed, will be removed independently with probability p as well and so on. In particular, if a to-be-reported individual has been already died naturally, its alive to-be-reported offspring (or parent) will also be removed. Moreover, we notice that each not to-be-reported offspring becomes a new ancestor which independently produces a process in the same pattern. Finally, we show the proof of Theorem 1.

Proof of Theorem 1
First of all, it is worth noting that the two processes E n (β, γ, δ, p) and E(β, γ, δ, p) behave the same way besides one slight difference. An infection occurs in the epidemic whenever a birth occurs in the branching process, whereas an infection is "effective" only if an susceptible gets infected. And in the n-th epidemic, the probability that an infective gives new infections to susceptibles is S n (t)/(n − 1) (≈ S n (t)/n when n large), where S n (t) is the number of susceptibles at time t. So, an infective infects new individuals at rate β S n (t)/n. In contrast to that, an alive individual in the limiting process give birth at rate β. However, if the size of population n is large and in the beginning of epidemic we have S n (t) ≈ n, then we have β S n (t)/n ≈ β, i.e. the rate of new infection to susceptibles in E n (β, γ, δ, p) is close to the birth rate in E (β, γ, δ, p).
As compared to the early stage approximation of standard SIR epidemic (Andersson and Britton 2000;Ball and Donnelly 1995), the number of alive individuals in this limiting process E(β, γ, δ, p) behaves not like a branching process, since it is possible that several death occur at the same time and thus the jumps of this limiting process can not only be up or down by one.
On the other hand, if the limiting process is described in terms of to-be-reported components, then it behaves like a branching process. A component starts with one newly born (infected) individual which would not report its infector and we call this individual the root of this component. During its life duration (infectious period) this individual gives birth to new individuals, some of which will be reported and others will not. Each of those not-to-be-reported individuals becomes a root of new components, whereas those who will be reported belong to the same component. Given that there are currently k to-be-reported individuals in one component, then each such individual gives birth at rate β, where each newborn belongs to the same component with probability p and generates a new component with probability 1 − p. Thus, each of individual in this component gives birth to new to-be-reported individuals at rate β p and thus the total birth rate is kβ p. Each alive (infectious) individual dies naturally(naturally recovered) at rate γ . The whole component is diagnosed if and only if one of those k to-be-reported individuals is diagnosed, implying that the death rate of this process of components is kδ. Until all these k individuals are removed, it generates roots of new components at rate kβ(1 − p). This describes the birth and death of the to-be-reported components.
By applying the coupling method (see Andersson and Britton 2000;Ball and Donnelly 1995), we show that the epidemic process E n (β, γ, δ, p) described in terms of components converges in to the branching process of to-be-reported components. A contact (infection) in the epidemic corresponds to a birth in the branching process. Obviously, the branching process and the epidemic process of components are perfectly coupled with each other up until the time T n when the first "ghost" appears, where by "ghost" we mean the newly contacted individual which has been infected in the epidemic. If we label each i−th contact as c i , then for any time t 0 ≥ 0, the event T n ≥ t 0 that there has been no "ghost" occur before time t 0 , is equivalent to the case that all the contacts c 1 , ..., c [t 0 ] are distinct. Using the classic birthday-problem method, we see that as n → ∞. This completes the proof of Theorem 1.

Properties of the Limiting Branching Process
Now we explore the properties of the limiting branching process E(β, γ, δ, p) of tobe-reported components which can be used to approximate the epidemic during the early phase.

Process of the To-be-Reported Components in the Full Model
First, we note that our reporting process can be decided in advance, and recall that we use the same reporting or not decision in both ways between each pair of individuals since at most one direction will be used. We then focus on this Markov jump process of the components having births, deaths and sudden killing of the whole component. We define the size k of an to-be-reported component by the number of alive (infectious) individuals in the component and hence ignore the dead (recovered) individuals. A component currently having size k produces new roots of new components at rate kβ(1 − p). The component itself remains with size k for an exponentially distributed time with rate k(β p + γ + δ), next event would be one of the three following independent cases. The first case is that a new infection occurs at rate kβ p, which corresponds to increasing the size of component by one. Secondly, we note that each of the individuals in the component becomes naturally recovered at rate γ . If this happens, then the size of component would decrease by one. The remaining case is that one of the to-be-reported individuals is diagnosed and so the whole component is eliminated by diagnosis at rate kδ, which means that upon this event, the size of component goes down to zero.
In Fig. 1, we show an example illustrating how a "reporting branching tree" of to-be-reported components grows: at first, we have a newly infected case, namely the node 1. An edge goes from one node to another node, meaning that the latter one is infected by the previous one. The dashed edge between two nodes means for the not-to-be-reported case, whereas the full edge stands for the to-be-reported case. After a certain period, there is a to-be-reported component, denoted by C 1 , produced by its Fig. 1 Example of a "reporting tree": The white nodes stands for "infectious", the grey ones for "naturally recovered", whereas the black ones for "diagnosed". A directed edge from one node to another means that the latter one is infected by the previous one. Full edges reflect to-be-reported contacts (probability p) and the dashed ones for those not to be reported root 1, and three newly generated roots 2, 3 and 4, each of which produces their own to-be-reported components, denoted by C 2 , C 3 and C 4 , respectively. Furthermore, the white nodes stands for "infectious", the grey ones for "naturally recovered" and the black ones for "diagnosed". We can see that at this stage when the roots 2 and 4 are diagnosed, the whole components C 2 and C 4 are reported and immediately diagnosed.
Our interest is to derive the important quantity for our epidemic model, namely the effective component reproduction number R (c) * , which is defined as the expected number R (c) * = E[Z ] of roots of new components generated by one root before its component is removed (completely diagnosed or dies out undetected). Since we aim at examining the effect of testing and tracing, so given fixed rates β and γ, we consider R (c) * = R (c) * (δ, p) as a function of testing rate δ and tracing probability p. Later in Sect. 6, we will show how the R (c) * varies with the testing fraction δ/(δ + γ ) and tracing probability p.
To find the distribution of Z , we first discuss the number of events before the whole component is removed by computing the probability that the whole component is not removed before k events. We recall that at each time of event, there would be only one of three following events occurs. A birth occurs with rate kβ p, whereas the death of whole component happens with rate kδ and the size of component decreases by one with rate kγ . As a consequence the probability of giving a birth, which corresponds to increasing the component size by one, is given by the probability of death of the whole component equals and the probability that the component size decreases by one, is given by A different way of describing the evolution of the component is to consider increases and decreases by one (a simple random walk!) until some time when the whole component dies simultaneously. It is worth pointing out that the random walk may reach zero by itself and hence stop before a simultaneous death. The non-symmetric simple random walk {S m , m ≥ 0} on Z starts at 1 (S 0 = 1) and for m = 1, 2, 3, · · · , each jump of the random walk is independent and identically distributed with the jump probabilities On top of this, each jump may result in diagnosis of the whole component (simultaneous death), and each time at which this happens with probability δ/(β p + γ + δ). The number N D of events until the whole component is eliminated by diagnosis is hence geometrically distributed with parameter δ/(β p + γ + δ).
Next we derive the probability that the random walk does not hit zero before k jumps. Let denote the first hitting zero time of random walk and it is clear that only odd steps can be taken in order to hit the origin. So, for m even, the probability we have P(N r w = m) = 0. Otherwise m = 2 j −1 for j = 1, 2, · · · , we apply the Hitting Time Theorem in Hofstad and Keane (2008), which yields that the probability of first hitting zero at m−th step is given by where the probability of the (unrestricted) random walk equals 0 at m−th step is since in this case, the random walk must have taken ( j − 1) up-jumps and j downsteps. We conclude that the probability of the random walk not hitting zero before k steps equals Moreover, let N C denote the number of jumps up until the whole to-be-reported component is extinct (either from simultaneous diagnosis or all individuals having recovered naturally), i.e. N C = min{N r w , N D }.
Recalling that N D is geometric distributed, the probability that the whole component has not gone extinct before k = 1, 2, · · · , events is given by Now, it is sufficient to analyse the number X i of newly generated roots of components between each (i − 1)−th and i−th jump. Given a to-be-reported component of size k at that time, the roots of new components are generated at rate kβ(1 − p) for an exponential time of parameter k(β p + γ + δ). This implies that the distribution of X i is geometrically distributed with parameter This parameter is independent of k implying that the variables X 1 , X 2 , . . . , are identically and independently geometrically distributed as X . So, between any two jumps, the probability of k newly generated roots is given by Based on the former discussion, we conclude that the total number of roots of new components produced by a to-be-reported component can be written as As stated in Eq. (3), due to independence, it follows that where the expectation of X is given by

Proof of Corollary 1
In the following text we give the proof of Corollary 1.

Proof of Corollary 1
Intuitively, we note that the limiting process of components will become extinct when E[Z ] ≤ 1. This implies that a minor outbreak will occur, if the component reproduction number R (c) * = E[Z ] is smaller than or equal to one. Next, regarding to the situation when R (c) * > 1, the branching process is possible to explode, and so there will be a major outbreak in the epidemic. Now we put our focus on the probability π of minor outbreak and the probability of major outbreak, namely 1 − π .
We assume that there is one initial infective. As discussed in previous section, the probability of minor outbreak in the epidemic can be approximated by the probability of extinction in the limiting process at the early stage of outbreak. Given k newly generated roots, the conditional probability of extinction is then clearly π k . Thus, the probability π is the solution on [0, 1] of the following equation: where we note that the right-side of Equation (19) is exactly the probability generating function ρ Z (π ) of Z . For the computation of ρ Z , we have where the probability generating function ρ X of X is given by + δ), and the probability generating function ρ N C of N C is given by Finally, we obtain the probability generating function ρ Z of Z : Solving the equation with Eq. (20) on [0, 1] gives us the smallest solution π, which equals the probability of minor outbreak in the epidemic with one initial infective. In addition to that, if there are initially m infectives, small outbreak occurs with probability π m .
We then aim to derive the effective individual reproduction number R δ, p), which equals the expected number of infected cases generated by a random infectious individual (before being tested or recovering).
We start with a new expression for our effective component reproduction number. Let I c be the overall number of individuals who have been infected in a to-be-reported component before it goes extinct, starting with one single infectious individual. For k ≥ 1, let p k = P(I c = k) be the probability that there have been in total k individuals infected in a component, and let r k be the expected number of new roots generated by such a component, given that I c = k. It then follows that Further, during the early stage of epidemic, the probabilityp k that an individual belongs to a component with I c = k, is given bỹ Given that I c = k, there would be (k − 1) infections occurred in the component and r k average infections out of the component before it dies. In total such a component hence on average generate k − 1 + r k infections and randomly chosen individual hence infects ((k − 1) + r k )/k on average. This implies that our effective individual reproduction number is given by Simplifying the expression for R (ind) * in Equation (21) gives us The equation above shows us that R (ind) * is smaller than, equal to, or larger than 1, if and only if R (c) * is smaller than, equal to, or larger than 1. Intuitively, we can explain this relation between R (ind) * and R (c) * as follows. On one hand, the R * + μ c − 1)/μ c . It remains to compute the expected number μ c of the individuals who have been infectious in a component, since we have already derived R (c) * . We start by letting J + = ∞ k=1 I k denote the number of up-jumps of the random walk {S n , n ≥ 0} before it dies out, where I k is the indicator variable with I k = 1 if the k-th jump of the random walk is an up-jump, i.e. for any k ≥ 1, Then we conclude that Together with Eq. (7), we prove the final expression for R (ind) * given by Remark 5 This individual reproduction number has the correct threshold property, since it equals 1 exactly when R (c) * does. However, R (ind) * cannot be interpreted as the average number of infections caused by infected people in the beginning of the outbreak. This is because of delicate timing of events issues, closely related to those explained in Ball et al. (2016).

The Limiting Process in the SI-TT Model
In the SI-TT model there is no natural recovery: γ = 0. This special case turns out to give simpler explicit expressions. The reason for the simplification is that a component can then only go extinct due to an infectious individuals being diagnosed (resulting in the whole to-be-reported component being contact traced) whereas for the general model extinction may also happen because all individuals has recovered before a new to-be-reported infection took place.
We are interested in the number Z S I −T T of roots of new components produced by a to-be-reported component before it dies (i.e. is diagnosed). As before, the current number of infectious individuals does not affect the probability of the next jump being a new root or a diagnosis event. We can hence neglect the infections and conclude that there will be a geometrically distributed number of roots produced before the component is diagnosed. The parameter of the geometric distribution is simply the probability of a diagnosis rather than a new root: δ/(δ + β(1 − p)). In conclusion, we have for any k = 0, 1, 2, · · · that the unconditional density of Z S I −T T is given by .
Then in the SI-TT model, we are able to derive the effective component reproduction number R δ .
Again following the idea proving Corollary 1, we show Corollary 2 as follows.

Proof of Corollary 2 By finding the smallest solution s on [0, 1] of equation s = ρ(s)
with ρ(·) the probability generating function of Z S I −T T , we obtain that if R (c) * ,S I −T T > 1, the probability of minor epidemic outbreak equals In the case when R (c) * ,S I −T T ≤ 1, the branching process will be extinct with probability π = 1, implying that a major outbreak occurs with probability 0.
Moreover, using the same idea of computing the effective individual reproduction number in general case, here we first have the expected number of infected cases generated by the root of a component before diagnosed given by Then as stated in Sect. 2, the effective individual reproduction number for this case without natural recovery has the form Remark 6 Similar to Remark 5, this individual reproduction number possesses the correct threshold property but not the traditional interpretation as the average number of infections per individual in the beginning of the epidemic. It is also easily observed from Eq. (24) that the effective individual reproduction number in this SI-TT model is monotone decreasing with tracing probability p.

Proof of Theorem 2
In previous section, we applied coupling methods to approximate the epidemic at its early stage. In this section, we give an approximation of the main phase, where the epidemic is initiated with positive fraction ε of infectives. Here, we describe our original full model in the way of evolution of the clumps. By "clumps", we mean the to-be-reported components, and we only need to keep track of number of infectious individuals in each clump. In a population of size n, we assume that the number of initial infectives equals εn and the number of initial susceptibles equals (1 − ε)n. At time t ≥ 0, let S (n) (t) be the number of susceptible individuals with initial value For j = 1, 2, . . . , n, let I (n) j (t) be the number of individuals that are infectious and belong to a to-be-reported component currently containing j infectives. So, we have the total number of infectious individuals at time t, with initial values Let R (n) (t) denote the number of individuals which are recovered (counting both naturally recovered and diagnosed) with initial value It is then clear that for any time t ≥ 0, Next we prove Theorem 2, stating that the stochastic epidemic process denoted by converges to a deterministic process.

Proof of Theorem 2
Below, we study the corresponding truncated processes by maximizing the clump sizes to some large positive integer K . The corresponding processes are finite dimensional for which we apply theory for density dependent population processes. These results can then be extended to the original infinite dimensional systems (with arbitrary clump size) by observing that that the maximal clump sizes are exponentially small in K . As a consequence, the truncated processes can be made arbitrarily close to the original infinite dimensional processes by choosing K large enough. The fact that the processes are exponentially small is a direct consequence of that the epidemic processes SIR-TT may be dominated by the SI-TT (without natural recovery), and this process will have a geometrically distributed maximal clump size with parameter δ/(β p + δ). We omit the details of this argument and now should show that the truncated stochastic epidemic process converges to the truncated deterministic system. More precisely, using Kurtz's theory of Markovian Population processes (Andersson and Britton 2000), we show that the truncated stochastic "density" process, denoted by E (n) K (t) = (S (n) (t)/n, I (n) 1 (t)/n, I (n) 2 (t)/n, ..., I (n) K (t)/n) converges to a K −dimensional deterministic process which is defined by the finite system of differential equations as below.
and for j = 2, 3, ..., (K − 1) we have whereas in the case of j = K , And the corresponding initial conditions are and Essentially, we check if we are allowed to use the Theorem 5.2 stated in Andersson and Britton (2000) to show the convergence of truncated density process E (n) First of all, we notice that there are several jumps which can affect the process. In the case of a new non-to-be-reported infection, the process changes by (−1, 1, 0, . . . , 0) with the corresponding jump intensity function f (−1,1,0,...,0) If there is a new to-be-reported infection comes to the component of size 1, then the process changes by (−1, −1, 2, 0, ..., 0) with the corresponding jump intensity function f (−1,−1,2,0,...,0) When there is a natural recovery comes to the component of size 1 the process changes by (0, −1, 0, ..., 0) with the corresponding jump intensity function In the case when there is a new to-be-reported infection comes to the component of size j = 2, .., (K − 1), the process changes by (−1, 0, . . . , 0, − j, j + 1, 0, . . . , 0) with the corresponding jump intensity function f (−1,0,...,0,− j, j+1,0,...,0) Moreover, for the component of size K , if there is a new to-be-reported infection occurs, then the process changes by (−1, 0, . . . , 0, −K ) with the corresponding jump intensity function For j = 2, .., K , if there is a natural recovery comes to the component of size j, then the process changes by (0, 0, . . . , 0, j − 1, − j, 0, . . . , 0) with the corresponding jump intensity function f (0,0,...,0, j−1,− j,0,...,0) Further, the process changes by (0, 0, . . . , 0, 0, − j, 0, . . . , 0) if a component of size j is diagnosed, the corresponding jump intensity function is given by 0,0,...,0,0,− j,0,...,0) Then we obtain the drift function F defined in the Sect. 5.3 of (Andersson and Britton 2000), which is here given by It can be shown that for any x = (s, i 1 , i 2 , . . . , i K ) and y = (s , i 1 , i 2 , . . . , i K ) in domain there exists a bound M > 0 such that with the absolute norm | · | in R K +1 . This bound M can be roughly given by Finally, we are now allowed to apply the Theorem 5.2 in (Andersson and Britton 2000), which showing that on any bounded intervals [0, t end ], the truncated "density" process converges almost surely to the deterministic process which is defined by Equations (30)-(36). This convergence of the truncated processes combined with the earlier sketch of why the truncated processes approximate the infinite systems well by choosing K large completes the proof of Theorem 2.

Original Model
In this section we perform simulations supporting our large population results, and also investigate the effect of the TT-strategy. We do this mainly for the following parameter values (inspired from the Covid-19 pandemic). Before the TT-strategy is applied we have the Markovian SIR epidemic model with β = 0.75 and γ = 0.25, implying an average infectious period of 1/γ = 4 days and a basic reproduction number R 0 = β/γ = 3. When the TT-strategy is considered fix, we assume that δ = 0.125 and p = 0.5 implying that 1/3 of the infected individuals are tested and isolated while still infectious and that half of their contacts are reported for contact tracing (Lucas et al. 2020) believed that the fraction p of contacts that were successfully traced varies between 40 and 80%. Moreover in the following text, whenever computing the reproduction numbers R (c) * and R (ind) * numerically, we approximate the infinite sum in Eq. (5) by a finite sum with truncation size of 100.
First we performed 10 000 simulations of the epidemic and stored the final number infected in each simulation. We did this for three different population sizes, n = 1000, 5000 and 10 000, each simulation starting with one initial infectious individual. We say (quite arbitrarily) that there is a minor outbreak when at most 10% get infected during the outbreak, otherwise a major outbreak occurs. We summarize the fraction of minor outbreaks and the empirical mean fraction of infected individuals among the major outbreak cases in Table 2. To these simulations we add a line for the limiting results (denoted by n = ∞). In this line we have derived the minor outbreak probability using Eq. (9) with truncated sum up to 100 and the mean fraction of the major outbreaks is computed numerically using Eqs. (30)-(36) with truncation size K = 100 where r ∞ is approximated by r (t) for t = 100 Days and ε = 0.01 which shows evidence that our limiting approximations work quite well already for these moderate population sizes. In particular, we observe from the second column of Table 2 that the mean fraction of the major outbreaks becomes closer to the deterministic limit r ∞ for larger n (see  Fig. 2b,d, f the distribution for the major outbreaks is more peaked when the population is larger. We also note from those zoomed histograms for the major outbreaks (Fig. 2b, d and f), that they seem to follow a normal distribution with the deterministic limit as centre, especially for larger n (see Remark 4). Next, we illustrate the threshold results saying that when R (c) * ≤ 1 we expect only minor outbreaks to take place whereas when R (c) * > 1 also large outbreaks may occur. We first fix the parameters (γ , δ, p) = (0.25, 0.125, 0.5), and choose the β to be 0.40, 0.50, 0.59 and 0.67, so that the corresponding effective component reproduction number takes values of 0.75, 1.00, 1.25 and 1.50 using Eq. (3). Then for each case of R (c) * , we did 10,000 simulations of the epidemic with fixed size of population 5000. In Table 3, we show the fraction of minor outbreaks and the mean fraction of infected individuals among the major outbreaks. We see that in the case of R (c) * < 1 there are nearly no major outbreaks, whereas more major outbreaks occur for R (c) * > 1. As R (c) * grows bigger, there are larger outbreaks. We now study the time evolution of the epidemics showing that it becomes less random as population size n increases, as stated in Theorem 2. We do this by plotting random epidemic processes {I n (t)/n} and comparing it with the limiting deterministic process {i(t)}. As before, we use the parameter values (β, γ, δ, p) = (0.75, 0.25, 0.125, 0.5). More specifically we plot the deterministic (in red) curves of the fraction of infectives when the population size is 1000, 5000 and 10 000, respectively. For each population size, we plot the fraction of infected for one simulation (in black), then we did 10 simulations given each size of population and plot the empirical mean of the fraction of infected (in blue). We can see from Fig. 3 that the larger the population size, the better the truncated deterministic process approximates the epidemic process. All simulations were started with 1% being infectious (I n (0)/n = 0.01) and the rest susceptible. The deterministic fraction of infectives are derived by solving Eqs. (30)-(36) with ε = 0.01 and truncation size K = 100.
Moreover, we investigate the effect of TT strategy. We recall that δ denotes the rate of testing (either broad screening or more targeted testing) and isolate those who test positive immediately, and p denotes the fraction of all contacts of infectious individuals that are successfully contact traced. In Fig. 4, we plot the effective reproduction numbers R (c) * and R (ind) * derived by Eqs. (3) and (7), respectively, as a function of the fraction of infectives being tested (before natural recovery) δ/(δ + γ ) in [0, 0.5] and of p in [0, 1], keeping the other two parameters fixed at β = 0.75 and γ = 0.25.  Figure Online) Histogram of the final size in 10,000 simulations of epidemic with population size in a-b n = 1000, in c-d n = 5000 and in e-f n = 10,000, starting with one initial infective with full histogram to the left and zoomed in on the major outbreaks to the right with normally fitted curve in red  with 1% initial infectives. The fraction of infectious individuals for one stochastic simulation is in black, the one for deterministic is in red, whereas the empirical mean of ten simulations is in blue Figure 4a shows that, surprisingly, R (c) * is not monotone in p, whereas Fig. 4b shows that the individual reproduction number R (ind) * seems to be, as expected. As seen from the contour lines where R (ind) * = 2.5, 2, 1.5, 1, the lines are steeper with lower R (ind) * .
When it comes to comparing the effects of p and δ/(δ + γ ) on R (ind) * , it seems as if δ/(δ + γ ) is more influential for high values (larger than 2.5 in this case) on R (ind) * , whereas for lower values (smaller than 2) on R (ind) * , tracing is more influential in preventing a major outbreak (i.e. reducing R (ind) * below 1).
In the lower panels we show the corresponding heat maps, but now for the case γ = 0.2 and ν = 0.05 implying that only 1/5 of infectives self-report, thus being closer to the original model where no individuals get tested prior to the introduction of screening. It is seen that whether the component reproduction number is monotone in p or not depends on what fraction that self-report when having symptoms. Another difference as compared to the original model is that the tracing probability p clearly has a bigger impact on reducing R (ind) * . An explanation to this would be that both self-tested individuals and those being screened will be contact traced.
Furthermore, if we assume that infectives who develop symptoms recover only due to diagnosis/self-reporting, then the fraction of asymptomatic individuals is exactly the fraction γ /(γ + ν) of infectives who do not self-report and are naturally recovered (without screening). By observing the steeper contour lines in Fig. 5b with smaller fraction of asymptomatic infectives compared with that in Fig. 5d, it implies that the tracing plays an even bigger role on reducing the individual reproduction number when there are larger fraction of individuals who are symptomatic.

Conclusions and Discussion
In the paper we have analysed a Markovian epidemic model also incorporating the effect of testing and contact tracing (the Markovian SIR-TT-model). By analysing the process of to-be-reported components, rather than individuals themselves, it was shown that the early stage of the epidemic could be approximated by a suitable branching process, and that if an epidemic takes off, its behaviour becomes less random as the population size n increases. The reproduction numbers, both for the components as well as for the individuals, were derived. Their dependence on the amount of testing and effectiveness of contact tracing were evaluated analytically as well as numerically. It was observed that the tracing probability p had a bigger impact on reducing the individual reproduction number as compared to the fraction being tested through screening, and this difference was even more pronounced in the situation when some infectives self-test also without being screened (the alternative model interpretation). Surprisingly, the reproduction number for the components was not monotonically decreasing in p, but the individual reproduction numbers seem to be (as expected).
There are several possible extensions to the model making it more realistic. For instance, the model assumes that there are no delays in either contact tracing or testing. The results in the present paper can hence be seen as a best possible scenario, but allowing for a delay would of course give information on how important such delays are and how much would be gained if contact tracing would be quicker. Further, we make the simplifying assumption that traced individuals who have by then recovered are also contact traced (cf. Müller et al. 2000 does not make this assumption). Further, the model assumes no latent period and that the infectious period follows an exponential distribution. Introducing a latent period most likely makes testing and contact tracing more effective in that individuals may get screened as well traced before even becoming infectious, but how to quantify this effect remains to be analysed. A different step towards realism would be to consider a structured community as opposed to the current assumption of a uniformly mixing community. Such structure could for example include households, spatial aspects, or some other network structures.
One the other hand, as the model was defined, only contacts that resulted in infection are considered for contact tracing. In reality it may of course also happen that contacts that did not result in infection are reported and traced. During the early phase of an epidemic such tracing events will rarely find new infected cases, but later in the epidemic when transmission is extensive it could (the individuals may have been infected by other individuals). Similarly, we do not consider contact tracing if an infectious individual has close contact with an individual who has already been infected, since such contact does not result in infection. To allow also for these type of contact tracing events is much harder to analyse and remains an open problem.
On the mathematical side two conjectures deserve to be proven (or disproved). The first is the statement for the final size of the epidemic in case of a major outbreak starting with one infective (see Conjecture 1). As in many similar epidemic models it seems highly plausible that this limiting final size agrees with that of the deterministic process taking t to infinity and looking at ever smaller initial starting fractions ε, but a proof of this is missing. In addition, Fig. 2 shows that the final size seems to follow a normal distribution around the deterministic limit for the case where there is a major outbreak. Then we suggest that a related central limit theorem could be an open problem to be shown. Further and perhaps a lower hanging fruit, is to compute the proper effective individual reproduction number (see Remark 5) or prove that the individual reproduction number R (ind) * in this paper is the correct one and it is monotonically decreasing both in p and testing fraction δ/(δ + γ ).
From an applied point of view it is of course important to have parameter estimates in order to say something quantitatively useful. The model has four parameters: (β, γ, δ, p). The average infectious period 1/γ is quite often known from earlier studies, and when the basic reproduction number R 0 = β/γ is known, estimates of β would also be available. Nevertheless, the test-and-trace parameters δ and p may be harder to estimate. In the case that testing comes from general broad screening it could be very well available: if for instance 1% of the community is tested each day would lead to δ = 0.01 with day as time unit. If testing is targeted towards suspected cases it might be harder to know the rate δ at which infectious people are tested. Finally, estimates of the fraction p of all infectious contacts that were detected by contact tracing is often hard to obtain. Perhaps a rough estimate could be obtained from studies investigating different type of contacts and how many infections they are responsible for. There are some statistical methods developed to estimate the tracing probability, e.g. a maximum-likelihood estimator in Müller and Hösel (2007) and an approximate Bayesian computation in Blum and Tran (2010). When it comes to digital contact tracing (by means of mobile tracing apps), the tracing probability p would approximately correspond to the square of the app-using fraction. With higher app adoption, app contact tracing is expected to be more effective as compared with the traditional contact tracing, potentially due to the quicker identification and notification of infectious contacts (see e.g. Jenniskens et al. 2021;Ferretti et al. 2020).
Analyses of epidemic models incorporating various preventive measures, and statistical studies relating to them, remains a research area deserving more attention in the future.