Introduction

COVID-19 pandemic has been impacting the life and economy across the globe since December 2019 and has caused major disruptions (Walker et al. 2020). As of August 2020, COVID-19 has infected nearly 20 million people across the globe with 90 countries in community transmission stage (World Health Organization) leading to significant efforts towards control (Rawaf et al. 2020), modelling (Barkur and Vibha 2020; Chatterjee et al. 2020a; Giordano et al. 2020), search for a cure (Le et al. 2020) for COVID-19 across the world and India (Chatterjee et al. 2020b; Singhal 2020). Keeping this in mind, in this paper, we analyze the evolution of COVID-19 cases and deaths in various Indian states. Specifically, we study and model the temporal evolution of infection and death counts for various time intervals and analyze their variations.

At the onset of the COVID-19 pandemic, India imposed the world’s strictest nationwide lockdown beginning from March 25, 2020 (Lancet 2020). However, preparedness and impact of the lockdown varied across states depending upon past experiences such as the Nipah virus in Kerala or Odisha’s disaster response due to recent natural disasters (Lancet 2020; Dore 2020). Therefore, attempts have been made to study the impact of COVID-19 in India. Sardar et al. (2020) mathematically assessed the impact of the first 21 days of the lockdown in terms of the total number of cases. Tomar and Gupta (2020) employed deep learning to provide a 30 day forecast of the death cases and recovered cases. Chatterjee et al. (2020a) provided estimates on the growth of infections using nonpharmacological interventions such as social distancing and lockdown. Network-based epidemic growth models have also been evolved for modeling COVID-19 pandemic (Marathe and Vullikanti 2013).

Epidemiological models, e.g., SEIR model, are being evolved to suit the national conditions (Bjørnstad 2018; Daley and Gani 2001; Labadin and Hong 2020; López and Rodo 2020; Peng et al. 2020). A model based on delay-differential equations considers the effects of past events (Shayak and Rand 2020). Ranjan (2020) studied the effects of various factors in the dynamics of epidemic spread. Due to lack of ample historical data ,many models for studying COVID-19 are appearing everyday (Chauhan et al. 2020; Bhardwaj 2020; Tiwari 2020; Sharma and Nigam 2020). However, none of them is able to model the epidemic pattern to sufficient accuracy (Holmdahl and Buckee 2020).

Furthermore, predictive models are based on the underlying patterns of COVID-19 data (Petropoulos and Makridakis 2020; Verma et al. 2020). Note however that the patterns of COVID-19 cases vary due to the extent of government measures (Hale et al. 2020a, b). Consequently, forecasting COVID-19 is quite complex.

Verma et al. (2020) and Chatterjee et al. (2020c) analyzed infection counts of 21 leading countries. They observed the emergence of power-laws after an initial exponential phase. They showed that China and South Korea followed power-law regimes—\(t^2\), t, \(\sqrt{t}\)—before flattening their epidemic curves. Also, the infection data for European countries (Spain, France, Italy, and Germany), USA, and Japan followed a power-law regime (\(t^n\), \(1\le n \le 4\)). They attributed these characteristics to long-distance travel and asymptomatic carriers. They concluded that \(\sqrt{t}\) regime is a common feature among all infection curves that exhibit saturation.

In this paper, we extend the works of Verma et al. (2020) and Chatterjee et al. (2020c) to the severely affected Indian states. We observe that some states exhibit \(t^2\) and \(t^3\) growth phases, while some others have linear or \(\sqrt{t}\) growths. Bihar, Karnataka, and Kerala appear to have second waves of infections. These findings will be useful to the epidemic control panel. We discuss our results in “Analysis and Results” section and conclude in “Discussions and Conclusions

Fig. 1
figure 1

Semi-logy plots of total infection count (I(t)) vs. time (t) (red thin curves) and \(\dot{I}(t)\) vs. t (blue thick curves) for the eleven states individually and consolidated for north-eastern Indian states. The dotted curves represent the best-fit curves. Refer Table 1 for the best-fit functions

Fig. 2
figure 2

Semi-logy plots of total death cases (D(t)) vs. time (t) (red thin curves) and \(\dot{D}(t)\) vs. t (blue thick curves) for six states of India. The dotted curves represent the best-fit curves (see Table 2)

Analysis and Results

In this paper, we analyze the COVID-19 infection and death counts in nineteen Indian states: Maharashtra, Tamil Nadu, Delhi, Gujarat, Uttar Pradesh, Rajasthan, Madhya Pradesh, West Bengal, Karnataka, Bihar, and Kerela. We combine the data of all the north-eastern (NE) states (Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Sikkim, and Tripura) because the counts for each of them is rather small for any statistical analysis. As of August 4, 2020, the above states constitute about \(78\%\) (\(1.45\times 10^6/1.85\times 10^6\)) of the total COVID-19 infections in India. For our analysis, we employed the real-time data available at the website of Ministry of Health and Family Welfare, Government of India. We have consolidated the data using the Application Programming Interface (APIs) from COVID-19 India Tracker.

For our analysis, we consider data till August 4, 2020. First, we perform a temporal evolution analysis of Infection count, which is denoted by I(t), where t is time in days. In Fig. 1, we plot the time series of I(t) and its derivative \(\dot{I}(t)\) in semi-logy (the y-axis has logarithmic scale, while the x-axis has linear scale) format using red and blue curves respectively. The starting date, listed in Table 1, is chosen from the day the infection began to increase in the respective states.

Table 1 Best-fit functions for the total infections and the corresponding relative errors for major Indian states

Similarly, we studied the evolution of total death cases for six states (Maharashtra, Delhi, Gujarat, Tamil Nadu, West Bengal, and Uttar Pradesh) that have reported a large number of deaths. The cumulative death cases are denoted by D(t). The time series of D(t) and its derivative \(\dot{D}(t)\) are plotted in Fig. 2 in semi-logy using red and blue curves respectively. Also, the starting date (see Table 2) is considered from the day death counts begin to increase. The starting dates for the infection plots and the death plots are not the same. This is because the death cases peaks after a delay from the infection peak due to the incubation period.

We employ exponential and polynomial functions to compute best-fit curves on different regions of I(t) and D(t) data. The time series for both I(t) and D(t) follow exponential regimes during the early phases of the pandemic and subsequently transition to power-law regimes. This is in accordance with the earlier work of Verma et al. (2020) and Chatterjee et al. (2020c). The best-fit functions along with their relative errors are listed in Tables 1 and 2 for the infected and death cases respectively. The error for a given fit is calculated using the mean of the absolute difference between the best-fit curve and the corresponding actual curve. In Table 1, we report \(\mathrm{rel.}\,\mathrm{error} = (\mid I_{\mathrm{fit}}-I_{\mathrm{actual}}\mid \times 100)/I_{\mathrm{actual}}\).

Table 2 Best-fit functions for the death cases and corresponding relative errors for major Indian states

The epidemic curves transition to power-law regimes after the exponential phase. We employ Python’s polyfit function to calculate the best-fit polynomials for these regions. The polyfit function employs regression via minimization of error and provides best-fit curves. Verma et al. (2020) and Chatterjee et al. (2020c) showed that the epidemic curves of many countries pass through a series of polynomials, \(t^3\), \(t^2\), t, \(\sqrt{t}\), before saturation. The intermediate power-law regimes are believed to appear due to lockdown and social restrictions. As shown in Fig. 1, many states deviate from the above patterns, which are possibly due to unlocking in India on June 8. Note that the unlocking of various states occurred at a later date. In the following discussion, we describe how the I(t) curves for various states behave.

The infection curves of Maharashtra, Tamil Nadu, and West Bengal, as well as the combined NE-states, exhibit a \(t^3\) regime followed by a \(t^2\) phase. Whereas, Gujarat and Madhya Pradesh have reached a linear growth after going through a \(t^2\) regime. The I(t) curve of Delhi follows \(t^2\) and linear regimes before reaching a \(\sqrt{t}\) growth. This trend indicates that Delhi is close to saturating its epidemic curve. It is worth noting that the states mentioned above follow the universal pattern of the epidemic curve (Chatterjee et al. 2020c).

Unfortunately, the power-law regime of some states does not follow the universal trend. For instance, Uttar Pradesh reached a \(t^3\) phase after passing through \(t^3\) and \(t^2\) regimes. Also, the infection curve of Rajasthan reached \(t^2\) after following \(t^2\) and linear regimes. These states observed a gradual growth in daily cases as their I(t) curves pass through the power-law regime. However, this growth is still not exponential and hence does not amount to a second wave. Note that such deviations from the universal pattern are indicators for authorities to take suitable action.

The infection curves of Bihar, Kerela, and Karnataka exhibit a rise which is preceded by a region of a linear regime or a nearly flattened curve. Also, this growth of the I(t) curve is exponential indicating a second wave for the epidemic (de Castro 2020). The emergence of this phase corresponds with relaxation in lockdowns and an increase in testing intensity. In Bihar, such a surge may have resulted from the influx of migrant workers and students from different parts of the country. The best-fit curves for the second wave are functions of \(\bar{t}\), where \(\bar{t}=t-t_0\). Here, \(t_0\) corresponds to the day from which the daily count shows an unprecedented rise after a region of decline.

Similar to the I(t) curves, the D(t) curves begin with exponential regimes (\(D(t)= A_d\exp (\beta _d(t))\)), and then transition to power-law regimes (\(t^3\), \(t^2\), t). Interestingly, for many states, the power-laws for both I(t) and D(t) curves are qualitatively similar. For example, both I(t) and D(t) curves for Gujarat exhibit a \(t^2\) region followed by a linear phase (t). This further substantiates the claims of Chatterjee et al. (2020c) that D(t) is proportional to I(t) statistically. This is because a fraction of the infected population is susceptible to death. Note that the growth of D(t) curve has declined in many states with respect to their I(t) curves. This may be attributed to immunity developed in the community, better handling of critical cases, plasma therapy, etc.

The values of \(\beta _i\) and \(\beta _d\) represent the growth rates of infected and death cases, respectively. It must be noted that \(\beta _i\) and \(\beta _d\) depend on various factors such as immunity level, the average age of the population, population density, local policy decisions (lockdowns, testing intensity, social distancing, healthcare facilities), etc.

In Figs. 1 and 2, we also plot daily infection and death counts, which are represented by \(\dot{I}(t)\) and \(\dot{D}(t)\), respectively. We calculate the derivative using Python’s gradient function and take a 5-day moving average in order to smoothen the \(\dot{I}(t)\) and \(\dot{D}(t)\) curves. We observe that in the exponential regimes, the daily counts are proportional to the cumulative number of infected and death cases i.e. \(\dot{I}\approx \beta _i I\) and \(\dot{D}\approx \beta _d D\). Verma et al. (2020) show that power-law regime can be approximated as \(I(t) \sim At^n\), and hence, \(\dot{I}\sim I^{1-1/n}\). Similarly, it can be shown that for power-laws \(\dot{D}\sim D^{1-1/n}\). This shows that the daily counts are suppressed in the power-law region compared to the exponential phase. Note that in the linear growth regime, \(\dot{I} \approx \dot{D}\approx\) constant, implying a constant daily count. The daily count is expected to decrease after a linear regime (see Delhi in Fig. 1), however, this may not be the case when a second wave emerges.

Fig. 3
figure 3

Semi-logy plot of total Infection count (I(t) ) vs. time (t) curves for India (green curve) and India other than Maharashtra, Tamil Nadu and Delhi (magenta curve) where, \(\bar{I}(t) = I(t)_{\mathrm{IND}}-\{I(t)_{\mathrm{MH}}+I(t)_{\mathrm{TN}}+I(t)_{\mathrm{DL}}\}\). The thick blue and brown curves in the plot depict the derivatives of I(t) and \(\bar{I}(t)\) respectively. The dotted curves represent the best-fit curves

Fig. 4
figure 4

Log–log plot of total infected individual vs. time for India (both cases, see Table 3). Both I(t) (green curve) and \(\bar{I}(t)\) (magenta curve) curves follow a power-law, i.e., \(I(t)=At^4\), where \(A=0.007\). The thick blue and brown curves in the plot depict the derivatives of I(t) and \(\bar{I}(t)\) respectively

Table 3 Best-fit functions for cumulative cases and the respective relative errors for various stages of evolution shown in Fig. 3

An interesting question is whether the Indian states with lower COVID-19 cases are closer to saturation. To investigate this issue, we compute the infection time series for India without the three worst affected states, which are Maharashtra, Tamil Nadu, and Delhi. We denote this time series as \(\bar{I}(t)\), and it is computed as \(\bar{I}(t) = I(t)_{\mathrm{IND}} - \{I(t)_{\mathrm{MH}} + I(t)_{\mathrm{TN}} + I(t)_{\mathrm{DL}}\}\). In Fig. 3, we plot \(\bar{I}(t)\) and \(\dot{\bar{I}}(t)\), and compare them with the total I(t) and \(\dot{I}(t)\). From the plots it is evident that both the plots exhibit exponential and power-law regimes (see Table 3), and that \(\bar{I}(t)\) and I(t) are proportional to each other. Although these states comprise of almost \(45\%\) (\(85\times 10^6/185\times 10^6\)) of the total Infection count in India, their removal from total I(t) does not cause any behavioural change in the \(\bar{I}(t)\) curve. Based on these observations we conclude that almost all the affected states shown in Fig. 1 are following similar epidemic evolution.

In Fig. 4, we plot I(t) vs. t curve in log-log (both x-axis and y-axis has a logarithmic scale) format for both cases shown in Fig. 3. In the power-law region, we fit a power-law (\(I(t)=At^n\)) instead of a polynomial curve. The exponential n of power-law fit (\(n=4\)) differs from that calculated using polynomial fit (\(n=2,\,3\)). This analysis indicates that for a epidemic curve, the power-law exponent is typically larger than the highest power of the corresponding polynomials (the best-fit curve). Still, high-order polynomials will lead to larger power-law exponent.

We can summarize the findings of the state-wise epidemic study as follows. Most of the Indian states exhibit rise in the growth of infected cases. Some have reached up to \(t^2\) part of the epidemic evolution, while others have reached the linear regime (\(I(t) \sim t\)). Unfortunately, Uttar Pradesh and Rajasthan show an increasing trend in the power-law phase. While Bihar, Kerala, and Karnataka are observing a second wave of the epidemic. However, Delhi exhibits a decrease in daily cases and is closer to saturation. The overall count in India has shifted from \(t^2\) to \(t^3\). These observations indicate that we are far from saturation or flattening of the epidemic curve.

We conclude in the next section.

Discussions and Conclusions

In this paper, we analyzed the cumulative infection and death counts of the COVID-19 epidemic in the worst-affected states of India. The respective time series, I(t) and D(t) , exhibit exponential and power-law growth in the epidemic. Maharashtra, Tamil Nadu, and West Bengal and combined NE-states have reached \(t^2\) growth. While Gujarat and Madhya Pradesh have reached linear phase. The infection rate in Delhi exhibits a \(\sqrt{t}\) regime which indicates that it is close to flattening its curve. All these states follow the universal trend of the epidemic curve. However, Uttar Pradesh and Rajasthan, as well as states exhibiting a second wave (Bihar, Kerela, and Karnataka) deviate from the universal pattern. We remark that such deviations are indicators for the authorities to take suitable action. The epidemic in India has grown alarmingly after the lockdown restrictions were lifted. Note that the lifting of lockdown is expected to increase the social contacts, and hence the epidemic growth.

Regarding the death count, among the six states we analysed, West Bengal and Uttar Pradesh exhibit \(t^2\) growth, while Tamil Nadu and Maharashtra show linear growth. Delhi and Gujarat have reached \(\sqrt{t}\) regime. These observations indicate that the death rate exhibits a decline as compared to the growth rate of the infected cases. This may be attributed to immunity developed in the population (Tay et al. 2020; Kwok et al. 2020) and better treatment of critical cases (plasma therapy, more ventilators, early detection, etc.). At the initial stage, the death rate and infection rate are nearly proportional to each other, consistent with the earlier observation of Chatterjee et al. (2020c). We also observe that at present, the infection count in the whole country is increasing as \(t^3\). These observations indicate that we are far from the flattening of the epidemic curve.

The present work is based on data analytics, rather than focussing on specific epidemic models which are being constantly revised in order to successfully forecast the epidemic evolution (Peng et al. 2020; López and Rodo 2020; Mandal et al. 2020). Note, however, that the epidemic models involve many free parameters that lead to ambiguities and difficulties in the forecasting of the epidemic evolution. Our focus on data analytics is due to the latter reason. Our work shows that the power laws in the epidemic curves indicate the stage of the epidemic evolution. This feature helps us in contrasting the evolution of the COVID-19 epidemic in various states of India.