A Predictive Model for the Evolution of COVID-19

We predict the evolution of the COVID-19 pandemic in several countries using a logistic model. The model uses a regression analysis based on the least-squares fitting. In particular, the growth rate of the infection has been fitted as an exponential decay, as compared to a linear decay, reported previously in logistic models. The model has been validated with the data of China and South Korea, where the pandemic is nearing to its end. The data of Italy, Germany, Spain, and Sweden show that the peak of the infection has been reached, i.e. a time when the new infections will start to decrease as compared to the previous day. The model predicts the approximate number of total infections at the end of the outbreak. The possible peak date and the total number of infections for different countries are predicted using the data available. The total number of infections in the USA is estimated to be around 4 million. The model prediction of Brazil shows that the peak will reach on 5 July 2020 and total infections will be 3.2 million. The reported data of India show a large initial scatter in the growth rate. The total number of infections in India is estimated to be around 2.4 million by the model and the predicted peak date is 3 August 2020. The predictions of India are discussed in the context of restricted movement of population, i.e. lock-down imposed by the government.


Introduction
COVID-19 pandemic, caused by the SARS-CoV-2 virus, has affected most of the countries around the world and has presented several challenges to humanity and science. A simple predictive model in this context can help to devise or modify policies by the governments while designing the mitigation measures and to judiciously use the healthcare infrastructure. Several factors influence the evolution of COVID-19, such as movement restriction of people by the government (lock-down), outdoor weather, and change in virulence of virus with time. In a recent study, Bhardwaj and Agrawal (2020) showed that the growth rate is weakly correlated to the outdoor weather during the evolution of the pandemic since the latter influences the drying time of respiratory droplets on a surface. To understand the effect of these factors on the evolution of an infectious disease, a predictive model applied to different countries could help to gain understanding of the evolution since these factors may vary from one country to another.
The epidemiological models are classified as exponential, logistic, and susceptible-infectious-recovered (SIR) models (Renshaw 1993;Ranjan 2020). Previous studies have proposed models in these categories to predict the outcome and evolution of COVID-19 in different countries. The exponential models are useful for predicting the initial growth of an infectious disease (Ranjan 2020). For instance, Verma et al.
(2020) used a variant of the exponential model, and used exponential and power-law functions to fit the rate of change of COVID-19 infections in different countries and argued that transition from an exponential regime to a power-law regime is a signature of the flattening of the curve. The logistic models have been used to predict the decay of the infections but such a model may fail in initial stages (Ranjan 2020). Vattay (2020) employed a logistic model for predicting the death rate in Italy and assumed that the growth rate of the infection is linear. Similarly, Batista (2020) employed a logistic growth regression model to predict the total infections and peak time of the pandemic. This study modeled the growth rate as linear decay. It was shown in Quadrat (2020)  of France and a logistic model is a better candidate to predict the pandemic evolution, specifically when the movement of the population, i.e. a lock-down, is imposed. Ranjan (2020) proposed an exponential and classic SIR model based on the available data and predicted that basic reproduction number ( R 0 ) for India is in the range of 1.4-3.9. Based on the SIR model, this study predicted that the peak date of infection will occur at the end of May in India if community transmission does not occur.  While several predictive logistic models have been reported previously (e.g. Vattay 2020; Batista 2020; Quadrat 2020), the present modeling effort is to develop a datadriven, extrapolation model that could render more accurate predictions. A previous logistic model used a linear fit for the growth rate of the infection and could not predict the data accurately (Vattay 2020). In the present work, we extend the model in Vattay (2020) and use an exponential function to fit the growth rate of the infection. The present results show that a logistic model, with growth rate modeled as an exponential decay, can confirm and predict the evolution of COVID-19 in several countries with reasonable fidelity.

Methods
The following ordinary differential equation governs the evolution of a disease in given human population (Vattay 2020):  Considering the unit of time as day and since the data is available for each day, we use Δt = 1 in the above equation. We estimate t i from the data of different countries given in public domain (https ://coron aviru s.jhu.edu, https ://www. world omete rs.info/coron aviru s and https ://www.covid 19ind ia.org), and use a regression analysis based least-squares fitting to the data of t i . The fitting results in the present work show that the exponential fit works well with the given data, as compared to a linear fit. The initial scatter in the growth rate was ignored while fitting the growth rate with an exponential function. The start day for the fitting of the curve corresponds to around 1% of the total infections on the present day. (2) Using the fit obtained, we extrapolate number of infections which will occur in future for t > t p using the fitted value of t i . The initial value of this extrapolation is taken as N P . This allows us to predict the time of the peak of infections, i.e. after this time, the daily infections will start to reduce. The total number of infections is also predicted, with the same extrapolation. The peak time is obtained by plotting N t i − N t i−1 against time and the end of the outbreak is considered when t i ≤ 0.001 − 0.01 , i.e. the rise in the total number of infections falls below 0.1-1%. The total infections are estimated at the end of the outbreak.

Model Validation
The model has been validated with the data of China and South Korea (https ://coron aviru s.jhu.edu). These two countries have been selected since the outbreak of the pandemic is nearing to its end. Figures 1 and 2 show that the timehistory of the growth rate of the two countries and fitted exponential decays are in good agreement.

Results and Discussion
First, we plot the results of the three countries--Italy (Fig. 3), Germany (Fig. 4), Spain (Fig. 5) and Sweden (Fig. 6), where the pandemic has reached its peak in March or April 2020 and the growth has slowed down. The data used in the model are from 22 January 2020 to 10 April 2020. Time = 0 corresponds to 22 January 2020 in all figures. The peak date of the infection obtained from the data (https ://www.world omete rs.info/coron aviru s) and the predicted number of infections have been listed for these countries in Table 1. The predicted number of infections is on the same order in Italy, Germany, and Spain. The number of infections in Sweden will be one order of magnitude lesser than Italy, Germany, and Spain, despite no lock-down in the former. Second, we plot the results of the USA where the peak of the infections has reached in April 2020. The data used in the model are from 22 January 2020 to 10 April 2020. Figure 7 plots the data of the USA and the peak of the pandemic is predicted to be around 03 May 2020. The data available (https ://www.world omete rs.info/coron aviru s) on 1 June 2020 shows that the peak date was 24 April 2020. Therefore, the model predicts the peak date of the USA with reasonable fidelity (with an error of 9 days). The total number of predicted infections by the model in the USA is around 4.2 million (Table 1).
Third, we plot the results of countries, Brazil and India, that are expecting the peak of the infections after April 2020. The data used in the model for these two countries are from 22 January 2020 to 31 May 2020. Figure 8 plots the data of Brazil and we note that the predicted peak of the pandemic is on 5 July 2020, with a total number of infections around 3.2 million (Table 2).
Finally, the data for India are plotted in Fig. 9. The data of growth rate in Fig. 9 (first row) shows reasonable scatter until around Day 75 and after that, we observe that the growth rate decays exponentially. The model prediction shows that the peak will occur on 3 August 2020 (Table 2), with total infections of around 2.4 × 10 6 . The plot of daily infections against time shows that the curve is a bit broader compared to that of other countries. A complete lock-down, imposed at the starting of the outbreak since 22 March 2020, maybe a reason for the slower outbreak. The initial scatter in the data could be attributed to cluster transmission initially and when community transmission was established, the growth rate decay became smoother after around Day 75. Therefore, a broader peak and slow decay of the growth rate are possible signatures of the effect of a strict lock-down imposed by the government. The model presented here has certain limitations. It assumes a constant human population with uniform mixing of the people and cannot predict the total number of fatalities or recoveries. The recovery from the infections is not considered, typically considered in classical SIR models. The recovery could reduce the active number of infections and thereby the growth rate. The model does not directly account for the mitigation measures, such as movement restriction of the population, i.e. lock-down, taken by the respective governments. However, there is an indirect effect of the lock-down on the growth rate and it is accounted for in the model by assuming it as an exponential decay. The model predictions are based on the reported data, which serves as an essential input to the model. In general, the predictions improve as the time progresses since more data becomes available (essentially, the model is data-driven, like a weather prediction model).

Conclusions
A logistic model to predict the evolution of COVID-19 pandemic is presented. This predictive mathematical model uses a regression analysis based on least-squares fitting. Specifically, the growth rate of the infection is fitted as an exponential decay, as compared to a linear decay. The model validation is carried out with the data of China and South Korea, where the pandemic has ended. We predict the date of peak infections, i.e. a date when the new infections will start to decrease as compared to the previous day and the total number of infections at the end of the outbreak. Data from several countries, namely Italy, Germany, Spain, and Sweden, USA, Brazil, and India are plotted for making predictions. All tested countries show a large initial scatter in the growth rate except Italy. The data of Italy, Germany, Spain, and Sweden show that the peak of the infection has reached and the predictions are made for the total infections. According to the model, the cumulative infections in the USA at the end of the outbreak will be around 4 million. The model prediction of Brazil shows that the peak date is 5 July 2020 and total infections are 3.2 million. The respective predictions for India are 5 August 2020 and 2.4 million. The predictions of India are discussed in the context of lock-down imposed by the government. A broader curve for daily infections Table 2 Predicted peak date of COVID-19 and the total number of predicted infections in India and Brazil. The data used in the model to predict the peak date and total infections is from 22 January 2020 to 31 May 2020

Country
Predicted peak date Predicted total infections Remarks Brazil 5 July, 2020 3.2 × 10 6 Partial lock-down India 3 August, 2020 2.4 × 10 6 Lock-down against time and a slower decay of the growth rate in the case of India is found to be the signatures of the effect of a strict lock-down. The limitations of the present model are briefly discussed.