A simplified estimate of the Effective Reproduction Number $R_t$ using its relation with the doubling time and application to Italian COVID-19 data

A simplified method to compute $R_t$, the Effective Reproduction Number, is presented. The method relates the value of Rt to the estimation of the doubling time performed with a local exponential fit. The condition $R_t = 1$ corresponds to a growth rate equal to zero or equivalently an infinite doubling time. Different assumptions on the probability distribution of the generation time are considered. A simple analytical solution is presented in case the generation time follows a gamma distribution.


Introduction
The Effective Reproduction Number R t is one of the main parameters that controls the evolution of an infection. It recently gained importance during the COVID-19 pandemic outbreak and is used as one of the indicators to determine restrictive measures such as regional or national lock-downs.
Different algorithms for its computation are available [1] [2] [3] [4] [5], some of which are very CPU intensive. Implementations are also available as software packages [6] for a number of algorithms, and results are presented on websites [7] [8] [9] with regular updates.
CPU-effective algorithms offer the advantage that estimates can be derived in real time as soon as new data are published. Often, results of simplified algorithms don't differ too much from the results of more accurate methods, in particular due to the limited quality of input data.
The following proposes a simplified approach to the estimate of R t based on a determination of the doubling time, or equivalently the growth rate, which can be simply achieved with a regression procedure.

The Effective Reproduction Number, R t
We assume I t is the number of infected persons at the time t, measured as number of days from a conventional beginning of the epidemic, defined as t = 0.
Each contagious person can infect other people during his infection period. We assume that a person that got infected at a day d will infect, on average, a certain number of other persons that become infectious at the day t > d with a discrete probability distribution w s , with s = t − d. The newly infected people, on turn, may infect more people with the same mechanism. s = t − d is defined as the generation time in literature and corresponds to the time interval between infector-infected pair.
The probability distribution w s is normalized to unity: In practice, after a sufficiently large amount of time, i.e.: for a sufficiently large value of s, w s becomes negligible. An estimate of w s from Italian infection data, unfortunately from a limited number of cases, is published in [10] where w s is approximated with a gamma distribution. At a time t, the expected number of infected persons, E[I t ] can be determined from I d , d = 0, · · · , t − 1, according to [3], as: or, equivalently, defining s = t − d, as: The simplest assumption on w s is a constant generation time g, which is equivalent w s = δ gs where δ gs is a Kronecker delta, i.e.: w g = 1 and w s = 0 for s = g. In this case, Eq 3 becomes: For COVID-19, the average generation time, defined as the mean value of a gamma distribution fitted to the Italian data, is g = 6.7 ± 1.9 days [10]. The Robert Koch Institute (RKI) takes instead for Germany the value g = 4 that gives a very simple estimateR or the smoother ratio of the moving averages over g days: Usually, the moving average over few days does not sufficiently smooth the distribution of the number of daily infected cases I t . In particular, the lower number of swab tests taken during the weekend causes a "ripple" structure that requires a further smoothing to be applied to the input data before evaluating Eq. 6. Figure 1 shows the number of daily confirmed cases, I t for Italy according to public COVID-19 Italian data from the Italian Dipartimento di Protezione Civile. The large dispersion of data is clearly visible, in particular around the more stable moving average over 7 days, also shown in the figure.

Relation between R t and doubling time
Another indicator of the growth of the epidemic is the doubling time τ 2 defined as the time required to double the number of infected persons, assuming an exponential growth.
Given n consecutive counts of infected people, I t−n+1 , · · · , I t , the following function model can interpolate the n counts: or, equivalently: The growth rate λ is related to the doubling time τ 2 by: Estimates of A and λ, or equivalently τ 2 , can be determined with a numerical fit procedure. In particular, the exponential fit can be conveniently implemented as a linear regression on log I t .
Assuming R t = R constant during the considered time interval, the evolution model in Eq. 4 represents an exponential growth. In a time period formed by a number of days n which is an integer multiple of g: n = N g, we have: where h = t − n + 1, or: Changing the base from R to e gives: E[I t ] = e (n log R)/g I h .

Simplified algorithm
We studied the progression of the COVID-19 pandemic in Italy, considering the data published on daily basis by Italian Dipartimento di Protezione Civile [13]. For each day t, we perform an exponential fit to the n last days' counts, I t−n+1 , · · · , I t . We determine an estimateτ 2 of the doubling time τ 2 , or an equivalent estimateλ of the growth rate λ, from a fit of the model in Eq. 7 or Eq. 8. Then, assuming a reasonable estimate g of the average generation time, we estimate R t according to Eq. 14 as:R t = e gλ = e (g log 2)/τ2 = 2 g/τ2 .
There are some advantages of Eq. 15 compared to the simplified model from Eq. 6: -Eq. 15 can also be applied in case g, the average generation time, is not an integer, while Eq. 6 must approximate g to the nearest integer. -The exponential fit better follows an exponential growth in the considered time interval, as it is the case when R t is a constant, with respect to a moving average.
At the cost of a modest increase in the computing time, yet maintaining very good speed, we consider the method proposed here to be more flexible and reliable compared to the method adopted in [4]. Moreover, the data smoothing can be tuned by including a sufficient number of points in the fit. In this way, no preliminary smoothing of the data is needed before the application of the algorithm. In the following sections, we will introduce extensions of Eq. 15 that allow a more precise determination ofR t than with the simplified assumption that w s = δ gs , i.e.: s is constant and equal to g.

Uncertainty estimate
Given Eq. 15, the uncertainty onR t is determined by the uncertainties onλ (or τ 2 ) and the uncertainty on g. Namely, if σλ and σ g are the uncertainties onλ and g, respectively, within a Gaussian error approximation, the variance of R t is given by: The error onR t is: The uncertainty onλ derives from the exponential fit procedure, while the uncertainty on g depends on how well the probability distribution of the generation time w s is known. From [10], the estimate of w s and its average g for COVID-19 in Italy is known from a limited number of cases.
In particular, whenλ = 0 (infinite doubling time), which corresponds toR t = 1, σ g doesn't contribute to theR t uncertainty. This means that an imperfect assumption on g does not affect the conditionR t = 1 which is important to determine the turning point of infection, from growing to receding, or vice versa.
The uncertainty computed in Eq. 17 does not take into account the systematic uncertainty due to the assumed approximation that the generation time s is constant, and equal to g. Moreover, the assumption of Gaussian uncertainties may not hold for an asymmetric distribution.
6 Effect of finite width in the w s distribution The deviation of w s from the hypothesis of a constant generation time s = g may be approximately estimated in the continuum approximation. Eq. 3 for a continuous time variable t may be rewritten as: where ρ(t) and i(t) are the continuum equivalent of R and I t , respectively. The normalization condition is: If s is a constant equal to g, we have w(s) = δ(s − g), where δ is a Dirac's delta function. Hence: Assuming an exponential growth i(t) = A e λt , one has: which gives the continuous version of Eq. 15, where ρ(t) = ρ is a constant: Assuming, instead, that w(s) deviates from the Dirac's delta assumption and has average value g and standard deviation σ, we may write Eq. 18 applying a series expansion of i(t − s) around s = g: We assume that w(s) 0 for s > t, so that the integration can be extended from 0 to ∞ instead of 0 to t. After the integration, in the first term the normalization condition of w(d) can be applied. The second term vanishes, and in the third term the definition of standard deviation σ of w(s) can be applied. Eq. 23 becomes: If we assume again i(t) = A e λt , hence i (t) = A λ 2 e λt , Eq. 24, becomes: The term A e λt simplifies. If λ 2 σ 2 1, we may write, approximately: hence: Equation 27 has already been reported in [12]. This result implies the width of the distribution w s has the effect to replace g in Eq. 22 with an "effective" generation time g eff that is somewhat smaller than the true average value and depends on λ according to: In order to take into account more details of the distribution, more terms may be added to Eq. 23. Those would add a dependency of ρ on the higher moments: skewness, kurtosis and possibly more, if required by the desired accuracy. Those cases are not considered in the present work.

"Exact" solution
If we assume, as in the previous section, that i(t) is an exponential function, or at least that it can be approximated to an exponential function within a time interval that is at least as wide as the time range in which w(s) is not negligible, ρ(t) can be computed "exactly", and is constant within that interval.
If we assume i(t) = A e λt , Eq. 18 becomes: Simplifying the term A e λt , as in the previous cases, ρ(t) can be computed as: If w(s) is negligible for values of s > t, we can extend the integration from 0 to ∞, and ρ(t) = ρ does not depend on t: This result is also reported in [12]. Note that if λ = 0, Eq. 31 becomes: The normalization of w(s) implies ρ = 1, regardless of the details of the probability distribution w(s).

The case of a gamma distribution
In [10], w(s) is approximated to a gamma distribution: where κ and θ, the shape and scale parameters, are determined with a fit to the Italian data. Equation 31 becomes: where the integration can be performed analytically: With some simplification of the Γ functions, the result is: The above equation is valid for −1/θ < λ < ∞. Again, λ = 0 corresponds to ρ = 1 for any values of κ and θ, as demonstrated in general in the previous section.
9 R t and τ 2 as indicators of the epidemic evolution R t is often used as indicator of the epidemic evolution. As we have seen, there is a very close relation between the Effective Reproduction Number and doubling time. The estimate of the doubling time τ 2 can be determine directly from the number of infected people, while R t also requires an estimate of the average generation time g, which propagates an extra uncertainty with respect to the estimate of τ 2 . The main feature of R t is the passage through the threshold value of one: R t > 1 indicates a growing phase, while R t < 1 indicates a receding phase of the epidemic. Those conditions are equivalent to τ 2 > 0 and τ 2 < 0, respectively, as evident form Eq. 15. In the caseλ = 0,R t is not affected by the uncertainty on the estimate of g.
For this reason, we consider τ 2 , or equivalently λ, a better indicator of the situation of the epidemic compared to R t , which may be of interest for other epidemiology purposes. Figure 2 shows R t , evaluated with the presented algorithm assuming a constant generation time, using the public Italian COVID-19 data released by the Italian Dipartimento di Protezione Civile [13]. Different values of the average generation time g have been assumed, from 3 to 7 days.

Results
The magnitude of the dependence of R t on g gives also a clue about the uncertainty on R t due to imperfect knowledge of g, which mainly affects the regions where R t is significantly different from 1. Figure 3 shows instead the evaluation performed with the three models discussed above: 1. Eq. 15, assuming a constant generation time of g = 6.7 days; 2. Eq. 28, assuming a mean value of 6.7 days and a standard deviation of 4.88 days; 3. Eq. 36, assuming a gamma distribution having parameters κ = 1.87 and θ = 3.57 days, respectively, as determined in [10] Note that the mean of the gamma distribution is equal to the product κθ.
All three methods give similar values for R t close to 1, but exhibit some discrepancy at more extreme values. Compared to the "exact" solution that assumes a gamma distribution (Eq. 36), assuming a fixed generation time (Eq. 15) gives a result that is about 9% larger at the highest value and about 4% larger at the lowest value. Including the contribution of the standard deviation term (Eq. 28) gives a reduction of about 12% at the larges value and 3% at the lowest value. Using (Eq. 15) with a lower "effective" g may improve the agreement with the "exact" solution at higher values at the cost of a poorer agreement at lower values. This is effectively done in the implementation of the RKI algorithm.  Figure 4 shows the application of different algorithms to the official Italian COVID-19 data published by the Italian Dipartimento di Protezione Civile [13]. The algorithm presented in this paper is noted as CovidStat and assumes a gamma distribution with the parameters reported above. It is compared with algorithms by Wallinga and Teunis [1], Bettencourt and Ribeiro [2], Cori et al. [3], and RKI [4]. Algorithms by Wallinga and Teunis and Cori et al. use the details of the probability distribution w s and are here implemented assuming the same w s as our algorithm. Bettencourt and Ribeiro uses a fixed time, that we have set to 7 days.
The method proposed here has been implemented with an exponential fit to the last 14 days. The RKI algorithm has been applied with generation time g = 5, since the original implementation with g = 4 showed significant discrepancy with respect to the other algorithms, consistently with what can be noted in Fig 3. A smoothing of the infection data with a Savitzky-Golay filter [14] using a time window of 15 days and a third-order polynomial was also applied to the infection data before applying the RKI algorithm.
The comparison of the proposed method with other algorithms shows a good agreement, considering the possible source of uncertainties and the intrinsic "ripple" structure of the data that may depend on the applied smoothing. In particular, agreement of our method is very good with the Wallinga-Teunis and with the Cori et al. algorithms. The agreement with the Bettencourt-Ribeiro is also good, considering that it includes a "ripple" structure due to the data fluctuations. The agreement with the RKI method is also reasonable after the assumed constant generation time is "tuned", with a residual disagreement for the cases where R t < 1. This feature is consistent with what can been observed comparing the "exact" solution computed for the gamma distribution to the one computed assuming a fixed generation time "tuned" to the more convenient value g = 5.5, as shown in Fig 5. Figure 6 shows the estimated growth rate λ and the corresponding R t for Italy data. Estimates are done with an exponential fit over the last 14 days. For R t the contribution to uncertainty due to the propagation of the statistical uncertainty on λ is, in most of the range, much smaller than the total uncertainty that also takes in to account the uncertainty on the parameters that model w(s), according to the estimate from [10]. This contribution to the total uncertainty is particularly large as the values of R t depart from one. For R t = 1, as noted before, the uncertainty  Fig. 3. Rt evaluated on the public COVID-19 Italian data released by the Italian Dipartimento di Protezione Civile with assuming a constant generation time, assuming a mean value and a standard deviation contribution, and assuming a gamma distribution. The assumed parameters are taken from [10].
contribution form the parameters that model w(s) is null. The magnitude of the total uncertainty is comparable with what is obtained from the algorithm by Cori et al. that tales into account the uncertainty on w(s).

Conclusion
A simplified method to determine an estimate of R t based on a local exponential fit is presented. The method can be applied assuming a fixed generation time, including the contribution of the standard deviation of the generation time distribution, or assuming a functional form for the probability distribution of the generation time. If a gamma distribution is assumed, a simple analytic solution is reported. The method offers some advantages compared to the simplified method adopted by the Robert Koch Institut, yet preserving good computing performances that makes it suitable for a real-time evaluation.
Results of the method applied to the public Italian COVID-19 data have been presented. The proposed method shows a good agreement with other, more complex, algorithms available in literature and implemented in public software packages.
We note a close relation between R t and the doubling time of the number of infections τ 2 , or equivalently the growth rate λ. In particular, the condition R t > 1 is equivalent to τ 2 > 0 or λ > 0. Since the determination of R t is affected by additional uncertainty sources compared to τ 2 , we consider τ 2 or λ to be a more sound and simpler indicator of the condition of growing or receding epidemic compared to R t , while R t may have more importance in other contexts of epidemiological interest.
We publish in real time daily estimates of R t as computed by our algorithm and by all the other algorithms quoted in this article for the cases in Italy and all the Italian regions under [9]. Daily values for the major world countries are also reported. Italy -R t -05/12/2020 date R t Fig. 6. Growth rate λ (top) and Rt (bottom). For the growth rate λ, the statistical uncertainty band at 90% Confidence Level is shown. For Rt the contribution to uncertainty due to the propagation of the statistical uncertainty on λ at 90% Confidence Level is shown together with the total uncertainty at the 68% and 90% Confidence Level, that also takes in to account the uncertainty on the parameters that model w(s). All data refer to Italy according to public COVID-19 Italian data from the Dipartimento di Protezione Civile.