An improved mathematical prediction of the time evolution of the Covid-19 pandemic in Italy, with a Monte Carlo simulation and error analyses

We present an improved mathematical analysis of the time evolution of the Covid-19 pandemic in Italy and a statistical error analyses of its evolution, including a Monte Carlo simulation with a very large number of runs to evaluate the uncertainties in its evolution. A previous analysis was based on the assumption that the number of nasopharyngeal swabs would be constant. However, the number of daily swabs is now more than five times what it was when we did our previous analysis. Therefore, here we consider the time evolution of the ratio of the new daily cases to number of swabs, which is more representative of the evolution of the pandemic when the number of swabs is increasing or changing in time. We consider a number of possible distributions representing the evolution of the pandemic in Italy, and we test their prediction capability over a period of up to 6 weeks. The results show that a distribution of the type of Planck black body radiation law provides very good forecasting. The use of different distributions provides an independent possible estimate of the uncertainty. We then consider five possible trajectories for the number of daily swabs and we estimate the potential dates of a substantial reduction in the number of new daily cases. We then estimate the spread in a substantial reduction, below a certain threshold, of the daily cases per swab among the Italian regions. We finally perform a Monte Carlo simulation with 25,000 runs to evaluate a random uncertainty in the prediction of the date of a substantial reduction in the number of diagnosed daily cases per swab.


Introduction
In a previous paper, we estimated the possible dates of a substantial reduction in the daily number of new cases of the Covid-19 pandemic based on the assumption that the number of nasopharyngeal swabs would remain roughly constant [1,2]. At the time of our previous analysis (March 26), the average daily number of swabs from February 25 was about 9000 per day. However, from March 27 up to May 16, the average number of daily swabs was about 50,600 (recently on May 1, the number of daily swabs reached the maximum of 74,208). Therefore, to study the evolution of the Covid-19 pandemic, we have to consider the analysis a e-mail: ignazio.ciufolini@gmail.com (corresponding author) b e-mail: antonio.paolozzi@uniroma1.it of the ratio of daily cases per swab. To possibly mathematically predict the evolution of the pandemic in Italy, we can fit the ratio of cases per swab using several different distributions. Some distributions are more suitable than others for forecasting the future behavior of the pandemic. We consider the following distributions: Gauss [3], Beta [3], Gamma [3], Weibull [4], Lognormal [5] and two versions of the Planck black body radiation law [6]. The number of parameters chosen in these distributions is either two or three, depending on the distribution. It turns out that a Planck law distribution with three parameters, along with the Gamma distribution with three parameters, provide the best fits and best predictions.
Furthermore, since the number of daily swabs depends on factors that are unknown to us, such as the daily availability of reagents and specialized personnel, we consider five possible trajectories for the daily number of swabs. (We have also considered some other time evolutions in the number of daily swabs, which for brevity we do not report here.) We fit the time evolution of the cases per unit swab up to April 25, using the two best-fit-prediction distributions, i.e., Planck with three parameters and Gamma, along with a Gauss distribution. After analyzing the time evolution of the cases per unit swab using these three distributions and five conceivable trajectories of daily swabs, we estimate the evolution in the number of new daily cases and the dates of a substantial reduction in such a daily number. Furthermore, in Sect. 3.1, we estimate the spread in the dates of a substantial reduction in the number of daily cases per swab among the regions of Italy, where the conditions are quite different from each other, including the number of swabs per person.
The different distributions that we use provide a possible independent way to estimate the uncertainty. Indeed, a basic problem is to mathematically estimate the uncertainty in the date of a substantial reduction of daily cases. For the purpose of estimating the random uncertainties, in Sect. 3, we report the results of a Monte Carlo simulation with 25,000 runs.

A mathematical analysis of the ratio of new daily cases per swab
After analyzing the time trend of the ratio of new daily cases to the number of daily swabs [7-9], we found [1,2] that this trend can be modeled by a Gauss distribution, however, the time trend has also a certain amount of skewness that can better be fitted by choosing a skewed distribution such as the Weibull, Lognormal, Beta and Gamma distributions, and also other distributions of the type of the Planck black body law. This last one with three parameters, reported in Fig. 1, shows the best fit and the best prediction capabilities. Fig. 2a-e reports the fits of the data with a distributions of the type of the Planck black body law with two parameters (i.e., with the exponent of the time variable t equal to 3 as in the Planck black body law), and with distributions of the type of the Gamma, Beta, Weibull, Lognormal, respectively. The data can also be approximated by a function of the type of a Gauss function with three parameters, as shown in Fig. 2f.

Prediction capabilities of the distributions fitting the data
As a relevant test to select the distribution which is more suitable to predict the evolution in time of the new daily cases per swab, we consider three different intervals of daily ratios: from February 25 until May 2, from February 25 until April 18 and from February 25 until April 4, i.e., 2 weeks before the date of May 16, 4 weeks before May 16 and 6 weeks before May 16, respectively. We then fit the daily ratios during these three periods with each of the seven distributions and finally test which one gives the best predictions for the measured ratios up to May 16, from May 2, from April 18 and from April 4, respectively. We test  (Table 1) and by calculating the lowest absolute value of the mean and Root Mean Square (RMS) of the residuals (difference between the measured daily data and the fitting function), as reported in Table 2. As shown in Tables 1 and 2, the distributions providing the best predictions are the Planck, Beta and Gamma distributions. The Weibull distribution does not fit well the data.

Analysis of new daily cases per swab of Covid-19 in Italy and Monte Carlo simulation
In the present section, we first analyze the expected dates of a substantial reduction in the ratio of the daily diagnosed cases per swab in Italy, and we then perform a Monte Carlo simulation with 25,000 runs to evaluate the random uncertainties in these dates. Using the best fitting distribution, i.e., a Planck distribution with three parameters (see previous Sect. 2) and Gamma-type and Gauss-type distributions, we find the dates when we expect that the ratio of daily diagnosed cases per swab will be below certain thresholds. We choose these thresholds to be equal to the minimum measured ratio of cases per swab (in the range February 25 to May 4), equal to 0.02163, incidentally corresponding to the first available datum of such ratio occurred on February 25; one-fifth of it and one-tenth of it. These dates are summarized in Table 3.
However, several uncertainties can influence the number of the new daily cases of the Covid-19 pandemic, in addition to the number of nasopharyngeal swabs that have increased with time as explained in the previous section, and to the change in the basic social distancing measures. Another source of uncertainty is related to the fact that the number of swabs is associated both to new cases and to the repeated ones of persons already tested as positive. However, if we assume that the percentage of re-tested people is approximately constant such error should be small.   Italy that day. This fit is obtained with a function of the type of a Gamma distribution [3], , with three parameters A, B and C, where: The beginning date is February 25. Root Mean Square of the residuals is 0.0390. c Fit of the ratio of the number of new daily Covid-19 detections in Italy via nasopharyngeal swab tests on a given day divided by the number of nasopharyngeal swab tests given in Italy that day. This fit is obtained with a function of the type of a Beta distribution [3], , with two parameters α and β, where: To possibly estimate the uncertainties in the number of new daily cases per swab, we use a Monte Carlo simulation [3,10,11] with 25,000 runs, similar to what was done in our previous works [1,2], and the differences in the predictions among the best distributions considered previously. The Monte Carlo simulation should account mainly for random uncertainties. The uncertainty we consider in the Monte Carlo simulation does not take into account that the real number of cases (which is unknown) can be one order of magnitude larger, or even more than the daily diagnosed cases. However, it is usual in statistics to use a sample as being representative of the population under study and the present Monte Carlo simulation is performed for the number of diagnosed daily cases per swab.
For the convenience of the reader, we summarize here the procedure used previously [1,2] for the total number of cases which is instead applied here to the ratio of cases per swab. The number of runs has been largely increased from 150 to 25,000. We assume a measurement uncertainty in the daily number of diagnosed cases per swab equal to 20% of each daily ratio (Gaussian distributed). Then, a random matrix (m × n) is generated, where n (columns) is the number of observed days and m (rows) is the number of random outcomes, which we have chosen to be 25,000. Each number in the matrix is part of a Gaussian distribution with mean equal to 1 and sigma equal to 0.2 (i.e., 20% of 1), either row-wise and column-wise. So, starting from the n nominal values of the daily data per swab, we generated n Gaussian distributions with 25,000 outcomes, with mean equal to the n nominal values and with 20% standard deviation. Then, for each of the 25,000 simulations, those n values (corresponding to the daily cases per swab of n days) are fitted with a three parameter function of the type of a Planck function (see Sect. 2), and we then determine the date at which the number of new daily cases will be less than a certain threshold that, for example, we choose to be equal to     Fig. 3 Histogram of the 25,000 runs of the Monte Carlo simulation for Italy: frequencies versus the day in which a substantial reduction in the daily cases per swab is less than 0.004326, i.e., less than one-fifth of the minimum value of cases per swab measured during the period February 25 to May 4 the minimum measured ratio of cases per swab (in the range February 25 to May 4), equal to 0.02163, one-fifth of it and one-tenth of it. Finally, we calculate the mean and the standard deviation of the 25,000 simulations for these three cases. The results are reported in Table 4. Fig. 3 reports the histogram of the frequencies versus the day of a substantial reduction in the number of daily cases per swab corresponding to less than 0.004326, i.e., to less than one-fifth of the minimum value of cases per swab measured during the period February 25 to May 4. The histogram approaches a Gaussian with mean equal to day 93, approximately corresponding to what reported in Fig. 1. The standard deviation is approximately 3 days.

Analysis of each region of Italy
We independently analyze each of the twenty Italian regions from February 25, 2020 (included), until April 25, 2020 (included). As in the previous sections, also here, we consider the ratio of daily cases over the number of daily swabs. Indeed, the number of daily swabs and other relevant conditions vary quite differently from one region to the other, so that we can get an indication of the spread between different regions. To calculate such spread, we evaluate the date of the reduction of the daily cases per swab in each region below a certain threshold which is chosen to be the minimum value of cases per swab in each region in the range under study (25 February-25 April). We then fit the daily ratios of each region using a function of the type of the Planck law with three parameters, and finally, for each region, we obtain the date at which the number of daily cases per swab reduces below the given threshold for each region. In Fig. 4, we report the 20 dates corresponding to the 20 regions. The calculation of the spread of the 20 dates provides a 1-sigma standard deviation of 11.4 days. Such spread might represent some systematic uncertainties in our mathematical predictions, using distributions with constant parameters, since social distancing measures vary in place and time from one region to another.
This result may also be displayed using the central limit theorem. We choose a sample of 30 regions, with repetition, out of the 20 regions, for 1,200,000 times. We then calculate the mean of each sample and we report the frequencies of the mean of the 1,200,000 samples in the histogram of Fig. 5. We obtain a Gaussian with a standard deviation of 2.03 days, that, multiplied for the square root of 30, gives back a standard deviation of approximately 11.4 days as shown in Fig. 4. This represents the fact that each region has quite different conditions with regard to the intensity and phase of the pandemic, the number of daily swabs, the social distancing measures and other factors.  Table 5 reports those five cases for a Planck distribution and the Gamma distribution, i.e., the best distributions with regard to prediction capability as reported in Tables 1 and 2, compared to a Gauss distribution which gives an earliest bound for the predicted date of a substantial reduction in the number of daily cases. In our previous work [1,2], a substantial reduction of the daily cases was chosen to occur when the number of cases reduced to about 100. However, since the number of daily swabs is significantly changing and thus the number of diagnosed cases increases as the number of swabs, we normalize 100 by multiplying it by the ratio: "number of swabs divided by 9000." The number 9000, as mentioned earlier, is approximately the average number of swabs up to March 26. Table 5 shows the dates of a substantial reduction of new daily cases predicted with the three different distributions. The date of a substantial reduction of cases in the population should be independent of the number of swabs since it describes the real evolution of the pandemic at a certain stage. For this reason, we introduce the normalization just described. In other words, the different numbers of daily cases reported in the first column of Table 5 would represent the same actual condition of the pandemic, i.e., 100 cases obtained with 9000 swabs would, e.g., be equal to 500 cases obtained with 45,000 swabs.
Since the number of daily swabs was increasing after March 26 by a factor of more than 5.6 with respect to our previous analysis, a substantial decrease in the number of new daily cases is reached later with respect to our previous mathematical prediction [1,2]. As mentioned at the beginning of this section, a better indication of the evolution of the pandemic is provided by the behavior of new daily cases per swab as reported in Figs. 1 and 2 and in Sect. 3. Indeed, Fig. 6 Three-dimensional representation of the number of daily cases as a function of the number of daily swabs (from 0 to 100,000) and of the day, from February 25, for a distribution of the type of Planck law with three parameters used to describe the new daily cases per swab the maximum value of the number of new daily cases per swab was 0.462 reached on March 9, whereas the minimum measured value (in the range February 25 to May 4) was equal to 0.02163. According to our previous work [1,2], the pandemic should have significantly reduced during the end of April. Indeed, on May 1, the ratio of daily cases per swab was 0.0265 (i.e., nearly its measured value at the beginning of the pandemic in Italy) and on May 15 the ratio was even smaller, reaching the historical minimum (from February 25 to May 16) of 0.0116.
In Fig. 6, we report a 3D representation of the number of new daily cases as a function of the number of daily swabs, from 0 to 100,000, and of the day, from February 25, using a Planck law distribution.

Conclusions
Since the number of daily swabs was rapidly increasing in Italy after March 26, we fit the ratio of the new daily cases per swab using several functions, including the Gaussian, Weibull, Lognormal, Beta and Gamma distributions and a Planck law function. 1 Indeed, using the best fitting distribution, i.e., the Planck distribution with three parameters, and depending on the chosen threshold for the daily cases per swab (historical minimum from February 25 to May 4, or one-fifth of it, or one-tenth of it), we found that a substantial reduction in the daily cases per swab will occur between May 5 and June 5.
Furthermore, considering that is practically impossible to predict the evolution of the number of daily swabs, we consider five possible relevant trajectories for the number of daily swabs. By considering these five trajectories and the Gauss, Gamma and Planck distributions, the range of dates for a substantial decrease in the number of new daily cases goes from April 27 (in agreement with our previous findings) to May 16, depending on the chosen distribution.
To characterize a substantial decrease in the pandemic, we have assumed a threshold of 100 cases per day when an average of 9000 swabs per day are used. However, if, for instance, the number of daily swabs is instead 67,000 (the maximum number of swabs per day up to April 25), the corresponding indication of that substantial decrease in the pandemic is given by a much higher number of cases, i.e., 744. Indeed, since the actual number of new daily cases is much higher than the measured ones, by increasing the number of daily swabs the number of new daily cases will also increase.
To estimate the random uncertainty in the dates of a substantial reduction of the pandemic in Italy, we have used a Monte Carlo simulation, with 25,000 runs, which provides a 1-sigma random uncertainty of approximately 3 days (calculated for a threshold of one-tenth of its historical minimum between February 25 and May 4). In addition, we also estimated the spread in a substantial reduction below a certain threshold of the daily cases per swab among the Italian regions.