Abstract
We study the effectiveness and limitations of contacttracing, quarantine, and lockdown measures used in India to control the spread of COVID19 infections. Using data provided in the media bulletins of Government of Karnataka we observe that the so called \(2080\) rule holds for secondary infections and classify them into clusters. Using a mixture of Poisson with Gamma model we establish that clusters show variation in deceased rates (\( 0\%17.31 \%\)), low reproduction numbers (\(0.210.77\)), small dispersion(\(0.060.18\)), and that superspreading events can occur. Further, migration due to relaxation in lockdown is unlikely to be the sole cause of recent surge. The methodology presented is universal in nature and can be applied whenever such precise data is available.
Introduction
For COVID19, in the absence of a vaccine, key measures to contain infection spread have been lockdowns, contact tracing, quarantine, testing along with wide publicity of social distancing norms, hygiene guidelines, awareness of the symptoms of the disease and treatment. There are many efforts to understand control measures such as lockdowns, contact tracing and quarantine with respect to COVID19 spread using stochastic models, see for e.g. (Joel et al. 2020; Ferretti et al. 2020). Contact tracing and other control measures were also used by countries during the Severe Acute Respiratory Syndrome (SARS) epidemic, see (Lipsitch et al. 2003; Steven et al. 2003) for a detailed analysis on control efforts and clusters initiated by “superspread” events (SSEs) and community transmission. In Ramanan et al. (2020), the authors study the epidemiology and transmission of COVID19 in two states of India namely, Tamil Nadu and Andhra Pradesh, using testing and contacttracing data.
In LloydSmith James et al. (2005), they argue that studying only the basic reproduction number can obscure the individual variation in infectiousness. Their motivation being ‘superspreading events’ in which certain individuals had infected unusually large numbers of secondary cases (5–10 in the SARS epidemic). They studied contact tracing data from eight directly transmitted diseases, and showed that the distribution of individual infectiousness around the basic reproduction number is skewed. Using various models they then proceed to compare the effect of individualspecific control measures versus populationwide measures. They conclude that superspreading events are a normal feature of disease spread and give a formal definition of the same.
To contain the spread of COVID19 infections in India, the Union Government started a strict lockdown on 25th March and relaxed it over 5 phases as follows: Lockdown Phase 1 (25th March–14th April) and Lockdown Phase 2 (15th April–3rd May) were the strictest in terms of mobility; Lockdown Phase 3 (4th May – 17th May) and Lockdown Phase 4 (18th May – 31st May) included relaxations in travel between states; and Unlock 1.0 (1st–30th June), Unlock 2.0 (1st–31st July) had considerable relaxations.
In Karnataka, a state of India with a population of approximately 70 million, from the very beginning quarantine measures and contact tracing were put in place for all tested positive patients. Since 9th March 2020, the Government of Karnataka has been providing detailed media bulletins (Novel Coronavirus (COVID19) 2020) containing specific guidelines on the virus and information on each patient who was tested positive in the state.
In this article we study the trace history provided in the media bulletins and try to understand the spread of the disease in the period from 9th March till 21st July 2020 in the state of Karnataka. From the trace history (see Covid19 indiatimeline an understanding across states and union territories 2020) we classify the patients who tested positive into several clusters. We analyse each cluster and the spread of disease within them. We also comment on the reasons for the possible spurt in cases from 27th June, 2020 onwards.
Materials and Methods
The COVID19 media bulletins of the State of Karnataka, from 7th March to 26th June, provided detailed information on the tested positive patients. In particular there was data on how each one of them contracted the virus (either due to travel history or by being a contact of someone who has already tested positive for COVID19) or what led to them being tested (either as a Severe Acute Respiratory Infection patient or someone with Influenza like symptoms).
Clusters
We first classify the tested positive cases into clusters based on the source of infection, for example “From Europe” or “Pharmaceutical Company Nanjangud”. Then in each cluster we place all the patients who contracted the virus independently from the place of origin, and then recursively add the patients to whom they passed the infection.
Before Phase 1 (25th March  14th April) of the lockdown began, almost all the COVID19 cases that were confirmed in Karnataka were either individuals who had some form of international travel history (from Middle East, USA, South America, United Kingdom and the rest of Europe) or those who were contacts of such individuals. Phase 1 and Phase 2 (15th April  3rd May) of the lockdown in Karnataka saw heavy restrictions on travel and nearly all services and factories were suspended.
During Phase 1 and Phase 2 of the lockdown, a Pharmaceutical company in Nanjangud, Mysore, saw a sudden increase in the COVID19 cases. Although the exact reason for the infection to have reached the company is unknown, the first patient to be infected (35 year old male, was confirmed to be infected on 26th March) came in contact with health care workers treating COVID19 patients. Another cluster that began during this period was the “TJ Congregation”, which contained those who attended the Tablighi Jamaat Congregation from 13th to 18th March in Delhi. The first patient in this cluster was confirmed as a COVID19 case on 2nd April. Both these clusters were very well contained and the last patients to be attributed to these clusters tested positive on 29th April and 21st May respectively. No more patients were attributed to these clusters since then. Phase 3 and 4 of the lockdown loosened restrictions on Domestic Travel and many infected individuals had some domestic travel history. The state saw a large influx of infected individuals from states like Maharashtra, Gujarat, Rajasthan and the Southern States (Tamil Nadu, Telangana and Andhra Pradesh). There were also patients whose source of infection was listed as interdistrict travel in Karnataka, travel to foreign countries or other states, healthcare workers and policemen on COVID19 duty and their contacts. The cases due to these reasons were too few to form separate clusters. We placed all these patients in a cluster called “Others”.
Testing strategy in India is governed by ICMR guidelines. The guidelines on 20th March mandated that all Severe Acute Respiratory Illness patients (i.e., patients with fever AND cough and/or shortness of breath) should be tested for COVID19, while the guidelines on 4th April mandated the same for all symptomatic patients with Influenza like Illness (fever, cough, sore throat, runny nose). Thus two other clusters that began during Phase 1 and Phase 2 of the lockdown were the Severe Acute Respiratory Infection (“SARI”) (first infection 7th April) and Influenza Like Illness (“ILI”) (first infection 15th April) clusters. These clusters contain those patients who have a history of SARI(and ILI), and those who can be traced back as contacts of such patients. It should be noted that only the first generation of the patients in this cluster are those with a history of SARI (and ILI), but the subsequent contacts of these patients need not be. In the media bulletins, patients whose contact tracing was incomplete were mentioned as ‘Contact Under Tracing’. We have assumed that these patients did not fall under SARI or ILI and placed them in a cluster called “Unknown”, along with their contacts who tested positive. An initiative taken by the government was to create Containment Zones in certain regions. The guidelines for these zones were clearly specified. The first case in contact with a containment zone was reported on 24th April. Since then a large fraction of the increase in this cluster occurred during Phase 3 (4th May–17th May) and Phase 4 (18th May–31st May) of the lockdown. For all these clusters, there was no information provided on the source of infection for the ‘parents’.
Our consolidated list of clusters are then given by
Reproduction number and Dispersion
In epidemiology, the “basic reproduction number” of an infection, denoted by \(R_0\), can be thought of as the expected number of cases to have contracted the infection directly from one case. Thus on an average, each infected person passes on the infection to \(R_0\) many healthy individuals. As mentioned earlier, in Karnataka during the period 9th March  26th June we have observed the COVID19 infection spread in a controlled environment. So whenever we calculate basic reproduction numbers we are actually calculating the short term effective reproduction number of the disease during this period. To be cognizant of this we shall use the notation \(R_{\hbox {eff}}\) to denote the basic reproduction number for a cluster instead of the usual notation \(R_0\).
We will examine Reproduction number and dispersion for “The 8 clusters” in this section, namely:
These began before 3rd May 2020 and have more than 50 individuals. There are ten clusters that satisfy these criteria from (2.1). We have omitted two clusters from analysis which satisfy these criteria, namely: “From Maharashtra” and “From Middle East”. We will analyse them in a later section. In Fig. 3 we present a summary distribution of parents, children, grandchildren, and great grandchildren in each of “The 8 clusters”.
For each individual i in the cluster we will denote the number of children (or the number of tested positive cases) assigned to patient i by \(y_{i}\). This means that there were \(y_i\) many positive infections whom the media bulletins listed as ‘Contact of Patienti’. The mean of \(y_i\) is the basic reproduction number \(R_{\hbox {eff}}\). In Table 1 we present a comparison of the summary distribution parameters (Maximum, Zeroes, Size, etc.) across clusters and we see that the variance does not match the mean. Further, as noted in Fig. 1, heterogeneity in the infectiousness of each individual implies that \(R_{\hbox {eff}}\) by itself is not a good measure of the infection spread. To account for the large variance, we now consider the standard method of mixture of Poisson distributions to model the data set. For each cluster, using the Negative Binomial with mean \(R_{\hbox {eff}}\) and dispersion k (see LloydSmith James et al. 2005 and Section A for details) as the offspring distribution, we will use the Maximum Likelihood method for estimating \(R_{\hbox {eff}}\) and k (see Section B for details). Using the methods developed in Saha and Paul Sudhir (2005) we provide 95% confidence interval for k and conditional on the estimates we perform the \(\chi ^2\)goodness of fit test. The details of the above can be found in Sect. B, C, and D of the Appendix.
Cases due to Migration in Phase 3,4 and Unlockdown 1.0
As mentioned earlier we had omitted two clusters from analysis, namely: From Maharashtra and From Middle East. Phase 3 and 4 of the lockdown, along with Unlockdown 1.0 in June loosened restrictions on Domestic Travel and International travel. The state saw a large influx of infected individuals from within India and abroad. During Phase3 of the lockdown, the “From Maharashtra” cluster saw the most growth and dominated the test positive counts by a significant margin. The “From Maharashtra” cluster accounted for approximately \(52.5\%\) of cases in the stipulated period. The “From Middle East” cluster seems to have two phases. The first occurred before the lockdown was enforced during which international travel was suspended. The second, more recent, was due to the repatriation flights from the region. We provide the Maximum Likelihood estimators for \(R_{\hbox {eff}}\) and k, along with their summary in Table 1. During this period domestic and international travellers were quarantined/tested on arrival. To make any meaningful inferences using reproduction numbers and dispersion one would have take into account a more detailed tracing history procedure from their origin of travel.
To understand cases due to Migration (6871 out of 10391) in this period we reorganized our clusters from (2.1) into four groups. Namely
Data
We have sourced all our data from the Daily Media Bulletins of Government of Karnataka: https://karnataka.gov.in/common10/en (till 27th April, 2020) and https://covid19.karnataka.gov.in/govt_bulletin/en (post 27th April, 2020). The media bulletins were very detailed and contained the following information till 21st, July 2020. We have converted them from their pdf format into usable CSV format and made them publicly available for use at our Data Repository at https://www.isibang.ac.in/~athreya/incovid19/.
Results
One thumb rule for disease spread, including COVID19 anecdotally, is the 20/80 rule. The rule states that 80% of the secondary infections arise from 20% of the primary infections. From Fig. 1, it can be observed that for Karnataka, almost 20% of the individuals with the highest infectiousness are responsible for 70% of the total infections. The large deviation from the \(y=x\) straight line represents the heterogeneity in the infected individual population.
If we consider the entire data as one Karnataka cluster then we find that its effective reproduction number is 0.2021 and dispersion is 0.0358 with a 95% confidence interval given by (0.033, 0.039). However to better understand the variations in the spread of infection we will present findings from each of the eight clusters. (Figs. 2, 3)
“The 8 clusters”

Heterogeneity and Variation: In Table 2, we have computed the Maximum Likelihood estimates for \(R_{\hbox {eff}}\) and k for “The 8 clusters” and also performed the \(\chi ^2\)goodness of fit test (see Section C for details regarding the goodness of fit). In Fig. 5 we have plotted the histogram from the derived Negative Binomial probabilities for each of “The 8 clusters” along with the observed relative frequencies of the number of infections caused. We have marked the 95th and 99th percentile for these distributions in the plot. In Table 2 and Fig. 4 we provide the confidence intervals for dispersion parameter with respect to “The 8 clusters”.
The “TJ Congregation” and the “Pharmaceutical Company Nanjangud” clusters both have higher \(R_{\hbox {eff}}\) among all clusters. The pvalues provide in Table 2 are not small for all clusters except for the cluster “Pharmaceutical Company Nanjangud”. This cluster has a very high variation, a maximum data point at 24 (i.e., one person who has been assigned to 24 secondary infections) and also a significant proportion at 1 secondary infection caused. One can also see that the confidence interval for the dispersion for “Pharmaceutical Company Nanjangud” cluster is quiet large as well, as seen in Fig. 4 and that the histograms differ with the Negative Binomial model in Fig. 5 as noted.
For each cluster in “The 8 clusters”, we found that basic reproduction number is less than 1 but the variance is larger than the mean. However, the distribution of secondary infections across all clusters is very skewed, with a significant mass at 0 due to the control measures taken. From the Negative Binomial model, we note that for most clusters their dispersion is low and is contained in a small confidence interval. Thus, though the clusters will most likely die out under the controlled environment, there is a reasonable chance of superspreading events occurring.

Superspreading events: In Fig. 2, we examine the distribution of the number of infections designated as contacts of infected indiviuals patients. A large peak is seen at 0 infections caused. It can be seen that only 9 individuals in the population of 4895 have passed the infection on to more than 20 people. This could be the result of a superspreader phenomenon or perhaps an effect of how the contact tracing and testing is performed. Assigning them as definitely arising from one particular individual will need a more careful understanding of the latter. One can further note that due to effective quarantine measures there are 4265 infected individuals who have not passed the infection on to anyone else.
Note that in Table 2, the largest number of secondary infections assigned to an individual is quite high for some clusters. This might be indicative of the superspreading phenomenon. From LloydSmith James et al. (2005), a general protocol for defining a superspreading event is as follows: (1) estimate the effective reproductive number, \(R_{\hbox {eff}}\), for the disease and population in question; (2) construct a Poisson distribution with mean \(R_{\hbox {eff}}\), representing the expected range of Z (without individual variation); (3) define a Superspreading event as any infected individual who infects more than \(Z_n\) others, where \(Z_n\) is the nth percentile of the Poisson(\(R_{\hbox {eff}}\)) distribution.
If \(R_{\hbox {eff}}\) and k have been estimated then one can use the definition and the Negative Binomial model to understand the probability with which such events will occur.
If we were to consider a 99th percentile event with the above \(R_{\hbox {eff}}=0.3447\), then an event causing more than 2 secondary infections would be considered a superspreading event. In the “Containment Zone” cluster, there is a person who has been assigned 7 secondary infections, this would be considered a superspreading event. Under the Negative Binomial model the probability of observing 7 secondary infections is 0.0027. This may indicate one of two possibilities, either a very, very rare event has occurred or it is just an effect of the testing and contact tracing method that was followed.
The relative frequency of superspreading events within “The 8 clusters” can be calculated using Table 2 and Fig. 5. The above indicates that the infection can be stemmed quicker by containing these superspreading events by using effective contact tracing.

Variation over time: If we consider 7th April and Descendants till 21st April, then there were 290 patients who tested positive and out of them 219 did not pass the infection to anyone else. There was one person who had been assigned 24 secondary infections and the mean number of secondary infections was at 0.6793 with a variance of 4.482. In contrast, if we consider the period 7th April to 3rd May and Descendants till 17th May, then there were 615 patients who tested positive and out of them 491 did not pass the infection to anyone else. There was one person who had been assigned 45 secondary infections and the mean number of secondary infections was at 0.7512 with a variance of 9.946.
The basic reproduction number is by no means a unique number for a disease or for that matter within a cluster. It greatly varies: with time from beginning to end; within a region due to its population density; and with interventions put in place to curb the spread of the infection. In Fig. 6 we compute the reproduction number for each of “The 8 clusters” studied and note that there is a significant variation over time. The “Pharmaceutical Company” cluster seems to have a reproduction number of 4 during the first week and then tapers off to 0 in five weeks. The “TJ Congregation” cluster also has a reproduction number that has variation over time but eventually due to tracing and testing tapers off to 0. “SARI” and “ILI” clusters have fluctuations throughout the period, due to new parents being added to the cluster.

Variation over Generations: Table 1 contains information on “The 8 clusters” with respect to generations within them. The maximum number of infections caused by an individual in the first generation is 30. An individual in the “Influenza like illness” cluster and another in the “Others” cluster have caused 30 secondary infections each. Among the individuals in the second generation the one to have caused 51 infections belongs to the “Others” cluster. It is observed that the mean secondary infections of patients belonging to the Generation4 (Great Grandchild) is 0.7042 and is significantly higher than the remaining generations. This is because of the small size of the generation (142) and one of the patients being assigned 45 secondary infections. While the highest generation that can be observed is Generation6 (Great great great grandchild), they haven’t been included in Table 1 as there isn’t a Generation7 for any cluster resulting in all the individuals in Generation6 being assigned 0 infections.
A heat map representing the mean infections caused, as studied across clusters and generations, is seen in Fig. 6. All clusters have been contained within 5 generations, this is seen by the fact the mean is 0 for the final generation of the cluster which is at most the fifth. One can see that the “TJ Congregation” and “Pharmaceutical Company Nanjangud” clusters have variation in mean across generation with the mean number of infections decreasing across each generation. The clusters closed out and did not added any new patients as per the bulletins. Generation 3 in the “SARI” cluster shows a very high mean. This is because there were only 17 individuals there out of which 1 person had been assigned 45 secondary infections. Similarly, the “Pharmaceutical Company Nanjangud” had one person among 22 parents who was assigned 25 secondary infections.

Variation with Age We consider the age distribution across “The 8 clusters” in Fig. 8a. It is seen that the distribution of the coronavirus patients has a higher fraction of patients in the age group 25 and above, whereas not too many in the range \(025\), when compared to the actual demographic distribution in Karnataka. A possible reason for this might be that many cases were restricted to travel of working professionals. The state also took steps quite early on to lock down schools and universities to prevent the younger segment of the population from being affected. The patients in the age group of \(015\) are either primary or secondary contacts of someone in their respective cluster.
In Fig. 6, we consider a heat map of ages across “The 8 clusters”. Patients below the age of 10 and those whose ages are greater than 90 have very very low mean of number of secondary infections caused. Most secondary infections are caused by middle aged people who are the most socially active ones. For both “SARI” and “ILI” the age group \(7090\) have higher means. This could be because of care takers and close family contracting the infection before the patient tested positive. The “TJ Congregation” has a higher mean across all groups from \(1080\) and the “Pharmaceutical Company Nanjangud” has similar features. This is perhaps due to the fact that “TJ Congregation” cluster arose from a meeting in Delhi and the “Pharmaceutical Company Nanjangud” consisted solely of company employees and their contacts.

Deceased and Recovery Rates: Table 2 contains the observed recovery and deceased rates of patients in each of “The 8 clusters”. It can be seen in Table 2 that the recovery rates are much higher than the deceased rates. The “Pharmaceutical Company Nanjangud” cluster had no one above the age of 70 and consequently perhaps has highest rate of recovery with 0 deaths. The highest deceased rate is seen in the “SARI” cluster where the deceased rate is around 17%. The “ILI” cluster also has a higher death rate than the remaining clusters. This again is perhaps due to the fact that the parents in “SARI”/“ILI” cluster had higher viral load. The remaining clusters have death rates between 1–5%.
Before 26th June, 95% of the cases and 59% of the deaths occurred in individuals less than 65 years old. Case fatality rate is 2.414%. Among the patients in Karnataka who were deceased, 66% did so before they tested positive. Among the deceased patients who tested positive while hospitalized, the median number of days before they passed away was 3. The highest number of days a patient was treated before passing away was 36. From the detailed information on deceased patents, it is also known that around 70% of them had comorbidities.
We also plot the days to recover (in Fig. 8c) and days to decease (in Fig. 8e) among patients who tested positive before 26th June belonging to “The 8 clusters”. It is seen that many patients who have passed away, do so on Day 0. This is because their samples, which result in positives, were sent for testing after their passing. It can be observed that the bulk of the deceased patients are between \(4575\) years. There does not seem to be any observable correlation between days to recover and age. We caution against making significant inferences from this graph as the “recovery policy” has changed with time (See for e.g. 1st April and 8th May Guidelines).

Variation across Districts: In Fig. 6 we have plotted a heat map of the mean number of secondary infections in each week for the different districts. This provides a framework for the time evolution of the reproduction number across districts as done in compartmental models. Most districts in Karnataka started having their COVID19 cases quite late, during early May. BangaloreRural, Bellary, Davangere, Dharwad, Karwar, Kodagu, Tumkur and Udupi have several weeks where no one tested positive, as earlier outbreaks were well contained. The Pharmaceutical in Nanjangud is in the Mysore district and the end of the outbreak is visible. In Davangere District the week 27thApril to 3rdMay has a large mean because of a patient who was infected on 29thApril and had been assigned 30 secondary infections and one on 30thApril who was assigned 18 secondary infections. This is typical when there is a large mean. Most of the cases in the districts have low mean number of secondary infections in May. This was mainly due to the fact that those tested positive in this month had migrated from other states and caused very few recorded secondary infections.
Migration in Phase 3, 4 and Unloc kdown 1.0

Variation across Districts: The “From Maharashtra” group (or cluster) affected the districts of Kalaburagi (1113 cases), Udupi (1020 cases) and Yadgir (891 cases) the most. BangaloreUrban received only 85 cases from Maharashtra (See Fig. 7a). The “InterState Travel” group affected BangaloreUrban and Mysore the most, though the absolute numbers were very low at 99 and 46 cases respectively (See Fig. 7b). The “Foreign group” contributed 379 cases (via airports in Bangalore and Mangalore) with \(76\%\) were detected in the Dakshina Kannada District (See Fig. 7c) and 43 of them were assigned to BangaloreUrban district. The “InterDistrict Travel” group affected Ballari the most, with 214 cases. Mysore and BangaloreUrban ranked next, but their absolute counts were quite low at 49 and 30 respectively (See Fig. 7d).
As seen from above, overall, Kalaburagi, Udupi and Yadgir were the worst affected districts. The three districts together received \(45.2\%\) of all infections due to Migration and BangaloreUrban received 257 cases due to migration (See Fig. 7e).

Migration versus Total: In Table 3 we compute the percentage of cases due to the migration group across districts during this period. We observe that Dakshina Kannada (\(84\%\)), Kalaburagi (\(89.8\%\) ), Mandya (\(95.3\%\)), Raichur (\(90\%\)), Udupi (\(94.3\%\)) and Yadgir( \(98\%\)) had very large proportion of their total cases due to the migration group. In contrast, in BangaloreUrban migration accounted for \(14.3\%\) of the total cases (See Fig. 7f).

Variation in Age, Recovery and Deceased: The histogram of age distribution of the migration cases shows that the distribution is concentrated around 2040 years as seen in Fig. 8b. There are also a higher proportion of cases having 0–20 age as compared to the histogram of all cases and a lesser proportion of elderly people as seen in Fig. 8a (which has the age distribution for the infected individuals belonging to the eight clusters studied earlier), indicating that most of the migrating individuals were families. This is probably because more children migrated along with parents, but very few elderly people did. Out of the 6871 migration cases, only 25 people have succumbed to the disease (as seen till 21st July). This is perhaps due to the fact that the elderly were in fewer proportion than in the 8 clusters that we analysed earlier. There were no casualities in the “Foreign group”. All but one person who passed away were more than 40 years of age as seen in Fig. 8f. There was also a high recovery rate with 6657 people recovering (as seen till 21st July). Most people recovered within 20 days of testing positive as seen in Fig. 8d. Again we caution against making significant inferences from this graph as the “recovery policy” has changed with time.
Surge in July
There was a sudden surge in cases in Karnataka after the migration period (4th May to 26th June). On 26th June the total cases in the state stood at 11005, which doubled on 9th July (31105 cases) and became four times on 21st July (71068 cases). We will try to outline the possible reasons for this surge.

ILI/SARI: From middle of June, the “ILI” cluster cases in Karnataka have been increasing and there was a sharp rise in the first half of July. They also formed a significant proportion of total cases. In BangaloreUrban district, the “ILI” cluster cases have been increasing since the middle of June, a sharp rise in the first half of July and also a significant proportion of total cases with over \(50\%\) on some days. The “SARI” cluster also shows an increase but the proportion fluctuates and is low, around \(5\%\).

Variation across districts: BangaloreUrban accounted for approximately \(50\%\) of the surge in July with the count being 1953 on 26th June and rising to 34691 by 21st July. In Kalaburagi, there were 1339 cases on 26th June and 2973 cases on 21st July. In Udupi and Yadgir, the cases doubled from 26th June to 21st July (Udupi 1126 cases on 26th June, 2406 cases on 21st July; Yadgir 916 cases on 26th June, 1713 cases on 21st July).
Discussion
From 27th June to 29th June the media bulletins did not provide any description for the patients who tested positive and from 30th June onward the description was not as detailed as before. Post 21st July the media bulletins did not contain any individual information on those who tested positive. A disproportionately large number of cases were designated as contact under tracing and thus fell in the “Unknown” cluster (see Fig. 9a, c), making it impossible to proceed on a precise analysis for “The 8 clusters”.
Our cluster classification was based on the trace history which is a measure of how contact tracing was done and how infected individuals are being identified for testing. It is important to note that the parent to child relationship in the trace history is indicative of the testing policy and contact tracing that was followed and need not be a definitive indicator of the genealogy of the infection spread. Among the four clusters where source of infection of the parents is not known ( “Containment Zone”, “SARI”, “ILI” and “Unknown” clusters ) the mean of secondary infections in the first generation is highest for “SARI”, followed by “ILI”. This is because, the first generation in “SARI”/“ILI” cluster are those with a history of SARI/ILI, displaying symptoms and having a high viral load of the infection. Also “SARI”/“ILI” clusters indicate some local transmission in the state making complete accounting of secondary infections via manual contact tracing a big challenge. We did notice that in the analysis of “The 8 clusters” that “SARI” and “ILI” clusters had \(R_{\hbox {eff}}\) less than 1 but there was a regular addition of new parents in these clusters. The continuing growth of these clusters indicates presence of viral load in the population. This could be due to one of many reasons. Patients in “SARI”, “ILI and “Unknown” clusters were not entirely contact traced or as their infection source was unknown, there were significantly many silent spreaders who did not fall into the contact tracing network.
Further, one could infer that severe restrictions by definition in the “Containment Zone” are proving effective with mean of secondary infections across all generations being less that one. Finally, the parents in the “Unknown” cluster presumably consist of patients, who at the time of testing, had mild symptoms or were asymptomatic patients (being part of random testing conducted routinely). If this is definitive, then one could conclude that the effective mean reproduction number for patients in this category is given by that of the “Unknown” cluster.
Another aspect to be considered is the Testing policy that was followed in May and June. There have been variations over time such as: nonuniform testing of the population across districts (e.g. testing only on migrants in Phase 3 and Phase 4 due to capacity constraints); and COVID19 contactworkers [Health, Law and order, Sanitation] in earlier months were not being tested enough that they inadvertently were spreading the virus. The fraction of positive tests is around \(6.77\%\) on 21st July and it is the highest fraction recorded upto this period. On 15th May, 2020 it reached an alltime low of \(0.7\%\). The number of total tests conducted up to 21st July is 1049982 which includes RAT, RTPCR and other testing techniques. The details as to the amount of tests done using each technique was not mentioned before 17th July. These provide a comprehensive count of testing numbers in the state but not clusterwise testing data. The number of infected individuals in the population differs from the number of positive test results. So equating the number of those tested positive to the number of infected individuals may be an error, because every individual in the population has not been tested. The State of Karnataka conducted a serological survey recently which provided insight on missed cases with a case to infection ratio of 1:40 (Babu Giridhara et al. 2021).
Finally, it seems unlikely that the Migration group in Phase 3, Phase 4 and Unlockdown 1.0 is the reason for the surge. We have already noted that the districts affected most by migration are Kalaburagi, Udupi and Yadgir (see Fig. 7g and above). From Fig. 9 we also note that the Migration group during the end of June and July did not account for a significant proportion of cases and the current surge was driven sharply by the cases in BangaloreUrban district.
Notes
In the Epidemiological literature k is referred to as Dispersion and \(k >0\) is assumed, while in the Statistics literature \(\frac{1}{k}\) is referred to as Dispersion given the connection with the Gamma distribution and is allowed to take negative values up to \(\frac{1}{R_{\hbox {eff}}}\).
References
Babu Giridhara R, Rajesh S, Siva A, Jawaid A, Kumar PP, Maroor PS, Rajagopal PM, Lalitha R, Mohammed S, Lalitha K et al (2021) The burden of active infection and antiSARSCOV2 IGg antibodies in the general population: results from a statewide sentinelbased population survey in karnataka, india. Int J Infect Dis 108:27–36
Covid19 Indiatimeline an understanding across states and union territories (2020). http://www.isibang.ac.in/~athreya/incovid19. Accessed Mar 2020
Ferretti L, Wymant C, Kendall M, Zhao L, Nurtay A, AbelerDörner L, Parker M, Bonsall D, Fraser C (2020) Quantifying SARSCOV2 transmission suggests epidemic control with digital contact tracing. Science 368:6491
Joel H, Sam A, Amy G, Bosse NI, Jarvis CI, Russell TW, Munday JD, Kucharski AJ, John EW, Fiona S et al (2020) Feasibility of controlling COVID19 outbreaks by isolation of cases and contacts. The Lancet Global Health 8(4):e488–e496
Lipsitch M, Cohen T, Cooper B, Robins James M, Stefan M, James L, Gopalakrishna G, Chew Suok K, Tan Chorh C, Samore Matthew H et al (2003) Transmission dynamics and control of severe acute respiratory syndrome. Science 300(5627):1966–1970
LloydSmith James O, Schreiber Sebastian J, Ekkehard KP, Getz Wayne M (2005) Superspreading and the effect of individual variation on disease emergence. Nature 438(7066):355–359
Novel Coronavirus (COVID19) (2020) Media bulletin, Government of Karnataka, Department of Health and Family Welfare, Bengaluru. https://covid19.karnataka.gov.in/govt_bulletin/en. Accessed Mar 2020
Ramanan L, Brian W, Reddy DS, Gopal K, Neelima S, Jawahar RKS, Radhakrishnan J, Lewnard JA et al (2020) Epidemiology and transmission dynamics of COVID19 in two indian states. Science 370(6517):691–697
Saha KK, Paul Sudhir R (2005) Biascorrected maximum likelihood estimator of the intraclass correlation parameter for binary data. Stat Med 24(22):3497–3512
Steven R, Christophe F, Donnelly Christl A, Ghani Azra C, AbuRaddad Laith J, Hedley Anthony J, Leung Gabriel M, LaiMing Ho, TaiHing L, Thach Thuan Q et al (2003) Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions. Science 300(5627):1961–1966
Acknowledgements
We would like to thank Gautam Menon for introducing us to the question on Dispersion and also for pointing us to LloydSmith James et al. (2005). We would like to thank Rajesh Sundaresan for providing us detailed suggestions that improved the presentation of the paper. Further we would like to thank Biswadeep Karmakar, P. Shankar, Deepayan Sarkar and Mohan Delampady for feedback and discussions. We also thank the anonymous referee for a careful reading of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Model
Let the random variable \(\nu\) represent the number of infections caused by a particular infected individual, called the individual infectiousness. We will model \(\nu\) coming from a probability distribution with mean \(R_{\hbox {eff}}\). In particular we will assume \(\nu\) is \(\hbox {Gamma}\) distributed with mean \(R_0\) and dispersion parameter k for some \(k >0\) and \(Z \sim \hbox {Poisson}(\nu )\), allowing Z to represent the number of secondary infections caused by each infected individual. A standard calculation shows that for \(z =0, 1,2,3, \ldots\)
Thus one interprets Z as having Negative Binomial distribution with mean \(R_{\hbox {eff}}\) and Dispersion k.^{Footnote 1}It can also be seen that Z has variance \(R_{\hbox {eff}}\left( 1+\frac{R_{\hbox {eff}}}{k}\right)\). Thus smaller values of k indicate larger variance. Depending on the heterogeneity different models can also be chosen. If one assumes \(\nu = R_{\hbox {eff}}\), then we are assuming a homogeneous population where each individual has the same infectiousness. This will imply \(Z \sim \hbox {Poisson} (R_{\hbox {eff}})\) for \(k =\infty\) and if we set \(k =1\) then \(\nu \sim \hbox {Exponential} (R_{\hbox {eff}})\), (which arises from mean field models assuming uniform infection and recovery rates), and this implies \(Z \sim \hbox {Geometric}(R_{\hbox {eff}})\).
Appendix B Maximum Likelihood Estimate
Given Data \({{\mathbf {y}}}:=\{y_i\}_{i=0}^n\), the loglikelihood (modulo constant terms) is
We follow (LloydSmith James et al. 2005) to estimate \(c = \frac{1}{k}\). First we rewrite the (conventionally accepted) loglikelihood as a function of \(R_{\hbox {eff}}\) and \(c=\frac{1}{k}\).
It is then standard (See Saha and Paul Sudhir 2005) that the Maximum Likelihood Estimator for \(R_{\hbox {eff}}\) is the sample mean, i.e. \(R_{\hbox {eff}} = \frac{1}{n}\sum _{i=1}^n y_i\) and Maximum Likelihood Estimator for c is a solution to
Using (B.1) it is not possible to solve for c explicitly. A numerical approximation scheme is used to obtain an approximate value of c. We use the uniroot function in R.
Appendix C \(\chi ^2\)goodness of fit test
Given Data \({{\mathbf {y}}}:=\{y_i\}_{i=0}^n\). Let \({\hat{R}}_0\) and dispersion \({\hat{k}}\) be Maximum likelihood estimators. To see if Negative Binomial with mean \({\hat{R}}_0\) and dispersion \({\hat{k}}\) is a good fit for the data \({{\mathbf {y}}}\) we shall perform the \(\chi ^2\)goodness of fit test. We will consider the range to \(\{0, 1, \ldots ,B\}\) with \(B = \min \{n+1,20\}\). Let \(y_1,y_2, \dots , y_n\) be the offspring data from a given cluster and let
Then consider the statistic
As we have estimated two parameters, it is known that \(\mathbf{X}^2\) has \(\chi ^2_{B2}\) degrees of freedom, asymptotically as \(n \rightarrow \infty\). One way to test if Z is the correct fit for the cluster is to compute the
There is strong evidence against the possibility that data arose from that model if pvalue is very small.
Appendix D Confidence Intervals
To compute the confidence interval for the Negative binomial dispersion parameter k, we compute it for its reciprocal c and then invert it. We noted earlier that the maximum likelihood estimate for c had to be solved numerically and it is known that the asymptotic sampling variance is given by a series expansion (See Saha and Paul Sudhir 2005). Let \({\hat{c}}\) and \({\hat{R}}_0\) be the M.L.E. obtained. Then let
Then the variance of \({\hat{c}}\) is given by
The 95% confidence interval for c is then given by
with \(z_{0.95}\) being the 95th percentile of the standard normal distribution. The \(95\%\) confidence interval for k is then given by
Note that the above interval will not be symmetric around k due to the inversion. For the computation of Variance in (D.1) we use a tolerance of \(10^{10}\).
Rights and permissions
About this article
Cite this article
Athreya, S., Gadhiwala, N. & Mishra, A. Effective Reproduction Number and Dispersion under Contact Tracing and Lockdown on COVID19 in Karnataka. J Indian Soc Probab Stat 22, 319–342 (2021). https://doi.org/10.1007/s41096021001061
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41096021001061
Keywords
 Variation
 Individual infectiousness
 Maximum likelihood
 Negative binomial
 Superspreading event.