The possibility of constructing unbiased estimators, e.g. for the true number of infected individuals at some point in time, or for the mortality rate in an epidemic, based on the data that are being routinely collected under the circumstances, is a feasible task, but it’s not a trivial one. Given its importance in decision making, however, it is essential to understand what the estimation challenges are.
One concern, during an epidemic, is the estimation of the mortality rate. At the end of the epidemic, the quotient \(Deceased/(Deceased+Recovered)\) represents a realization of the random variable whose expected value is sought, and thereby an unbiased estimator thereof. However, while the epidemic develops, it is unclear what the reference number ought to be, against which the number of casualties should be compared, since the daily number of infections evolves, and the transition rate from infectious to recovered is generally quite different from the transition rate from infectious to deceased. Such being the case, the above quotient should be estimated for the subset of individuals who were infected before a given date, once all cases of that subset are closed, in order to obtain an unbiased estimator of mortality—but such a statistic is unobtainable. The closest one can come is to consider the individuals whose illness was confirmed before a certain date, once all their cases are closed. Such a statistic is feasible, but generally not readily available. It should be noted that making it available, would offer an unbiased estimator of the mortality rate, within the intrinsic standard deviation of that variable. In its absence, an upper and a lower limit of mortality rate estimates can be established from current-time data, considering the two extreme situations; i.e. that the number of infected individuals I(t) existing at the present time t could evolve either to the recovered (R) or deceased (D) states. In the first case, the lower estimation limit is calculated to be \(D(t)/[D(t)+I(t)+R(t)]\), whereas in the second case the upper estimation limit becomes \([D(t)+I(t)]/[D(t)+I(t)+R(t)]\), leaving a rather wide margin of uncertainty during the epidemic. It should be noted that when \(I(t)\rightarrow 0\) at the end of the epidemic, both limits tend to the unbiased estimator, but during the epidemic, we shall use the estimator with respect to the closed cases to date, in spite of the aforementioned difference in transition rates.
Taking into account the above considerations, the effects of undersampling, particularly during the period of initial exponential growth of an epidemic, can be evidenced. The following example, taken from the current pandemic, shows how exhaustive sampling in a given community or country, can be used to mitigate undersampling elsewhere. At \(t=\) April 27, 2020, the reported situation in the USA was the following (https://www.worldometers.info/coronavirus/country/us/): total recorded cases \(D(t)+I(t)+R(t)=987{,}322\), closed cases \(D(t)+R(t)=174,196\), of which deaths \(D(t)=55,415\) and recovered \(R(t)=118{,}781\). At the same time the reported situation in Germany was (https://www.worldometers.info/coronavirus/country/germany/): total cases \(D(t)+I(t)+R(t)=157{,}770\), closed cases \(D(t)+R(t)=120{,}476\), of which deaths \(D(t)=5,976\) and recovered \(R(t)=114{,}500\). One can notice the discrepancy in \(D(t)/[D(t)+R(t)]\) ratios of \(\approx 0.32\) in the USA vs. \(\approx 0.05\) in Germany, with the latter being very close to the worldwide mortality rate that has been reported so far, i.e. 3–4% [see e.g. Wang et al. (2020), and https://www.worldometers.info/coronavirus/coronavirus-death-rate/].
A similar discrepancy appears also in the \([D(t)+I(t)]/[D(t)+I(t)+R(t)]\) ratios. More precisely \([D(t)+I(t)]/[D(t)+I(t)+R(t)] \approx 0.88\) in USA vs. \(\approx 0.27\) in Germany, signifying that the COVID-19 outbreak in the USA is at early stages. Also, the much higher \(D(t)/[D(t)+R(t)]\) ratio in the USA (by a factor of approximately 6.5), relative to Germany where the outbreak decelerates, means that there has been significant undersampling, as testing has been taking place at medical facilities, health care units, hospitals etc., where symptomatic cases are sampled at higher frequency. Applying the \(D(t)/[D(t)+R(t)]\) ratio of Germany to the USA, one can obtain a rough estimate of the number of closed cases based on the number of deaths. In this case, \(D(t)+\widehat{R(t)} = 55{,}415/0.05 \approx 1{,}100{,}000\), and one can use the current \([D(t)+R(t)]/[D(t)+I(t)+R(t)]\) ratio of the USA (i.e. \(= 174{,}196/987{,}322 \approx 0.18\)) to estimate the actual number of total cases. This calculation gives \(D(t)+\widehat{I(t)}+\widehat{R(t)} \approx 6{,}100{,}000\), which corresponds to an estimated value of \(\widehat{I(t)} \approx 6{,}100{,}000 - 1{,}100{,}000 = 5{,}000{,}000\) active cases, from which only \(987{,}322-174{,}196 = 813{,}126\) are detected and tracked. Under this setting, if we assume that the US economy opens in an unconstrained setting, and each undetected case infects one additional individual every 2.776 days on the average (which is the value witnessed before the shelter-at-home order was issued in the USA), then within 10 days the total number of cases will be \(D(t)+\widehat{I(t)}+\widehat{R(t)}=(5{,}000{,}000-813{,}126)\times 2^{(10/2.776)}+987{,}322 \approx 50{,}000,000\), resulting to approximately \(3\%\times 50{,}000{,}000\approx 1{,}500{,}000\) deaths in total, from which only 55, 145 have already been witnessed. If we assume that social distancing accompanied by extraordinary protection measures is effectively applied, and each undetected case infects one additional individual every 23.52 days on the average (which is the value witnessed after the shelter-at-home order was issued in the USA), then within 10 days the total number of cases will be \(D(t)+\widehat{I(t)}+\widehat{R(t)}=(5{,}000,000-813{,}126)\times 2^{(10/23.52)}+987{,}322 \approx 6{,}600{,}000\), resulting to approximately \(3\%\times 6{,}600{,}000\approx 200{,}000\) deaths in total, from which only 55, 145 have already been witnessed.