Introduction

During the past few years, the coronavirus disease 2019 (COVID-19) that caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been continuously spreading worldwide, posing a significant threat to public health. A comprehensive understanding on the epidemiological characteristics of COVID-19 underlies the strategic development of region-wide control policies to combat the epidemics. The fundamental biological parameters—basic reproduction number (R0) and effective reproduction number (R) describe the transmission potential of a typical infectious disease agent, that is, the average number of secondary cases generated by an infectious person in a completely and not completely susceptible population, respectively [1]. While for the COVID-19 epidemics, the differences arose in infectiousness, behavioral patterns and locally implemented public health interventions give rise to heterogeneous individual transmissibility [2, 3], which cannot be reflected by a single measurement of R0 [4].

A superspreading event (SSE) is defined as a transmission event involving an unusual large number of cases, initiated by the super-spreader. The SSE represented a heterogeneous transmission pattern, where the majority of the cases were seeded by a small fraction of super-spreaders [5, 6]. Herein, the aggregation of transmission for some superspreading cases has also drawn researchers’ attention, defined as “20/80” rule [5] in epidemiology, which implies that approximately 80% secondary infected cases and transmissions result from roughly 20% of primary cases. As a distinct feature of the transmission dynamics of COVID-19, SSEs played essential roles in aggravating the COVID-19 epidemics. For instance, in early November 2021 in Hong Kong, an outbreak in the community was caused by a few SSEs in entertainment places, which led to a major epidemic wave in the whole city [7]. In South Korea, the SSE seeded by the SARS-CoV-2 Omicron variants occurred in churches and schools, causing the disease to spread widely in the local community [8]. Characterizing the superspreading potential of the epidemics in the context could give policymakers a hint on how to effectively curb the local transmissions [9]. For example, identifying and shutting down the hot-spot contact settings favoring the occurrence SSE (e.g., bars, social parties, and gyms) could timely chop the transmission chains and prevent future large outbreaks. However, spurred by the increasing burden of spread of COVID-19, few researches have been involved in the potential of superspreading events in different contact setting.

As a forceful circumstantial evidence of community transmission and SSEs, Furuse et al. exemplified demographic information regarding some clusters of COVID-19 infectors and schematized their features in transmission chains from January to July 2020 in Japan with different contact settings of SSEs [10]. This study sought to explore the estimated effective reproductive numbers and dispersion parameters in offspring distributions based on the rearranged contact tracing data in transmission chains in Japan from [10]. Herein, with transmission clusters data collected during the early phase of the epidemics, we aimed to quantify the transmission risk and contrast of superspreading potential of the COVID-19 among different contact settings.

Methods

We obtained data on 28 circumstances of transmission clusters from January to July 2020 in Japan [10]. Based on the contact tracing and exposure history of each case within the transmission clusters, we constructed infectee-infector transmission pairs. We thereafter extracted the number of secondary cases (i.e., infectees) that were directly generated by each infector for further analysis. We excluded the cases that are indirectly linked with the infectors. The identified transmission pairs were further grouped by different contact settings (i.e., community, health care facility, school, household, and workplace) according to where the transmission occurred. Specifically, the contact setting “community” represented the aggregation of transmission dynamics in scenarios of social parties, restaurants, bars, clubs, ceremonies, gyms, etc. Segmentation of subgroups in the contact setting “community” was not feasible since the counts of them were trivial and not statistically significant. Furthermore, those without detailed information regarding contact settings were also omitted.

To quantify the superspreading potential, we assumed the number of secondary cases seeded by each infector following a Negative binomial distribution [6], which was parameterized by an effective reproduction number (R) as the mean and a dispersion parameter (k). The k captured the heterogeneity in the individual transmissibility. A lower value of k indicated a higher transmission heterogeneity, and thereby a higher superspreading potential. The number of offspring cases generated by each seed case was fitted to a negative binomial model. For the model parameter estimation, Markov chain Monte Carlo (MCMC) method was applied to estimate the joint posterior distribution of R and k.

The proportion of the most infectious cases that seeded 80% of the total transmissions was calculated [11]. The probability that a seed case generates a cluster with size 10 or more and the probability of observing SSEs were also computed. In addition to incorporating the expected proportion of infectors generating at least one infected individual and the probability that a seed case generates a cluster with size 10 or more, some intuitive concepts, such as the proportion of the most infectious infectors responsible for 80% of infectees and the expected probability of superspreading events, were also attained based on estimated [12,13,14]. Followed by previous work [6], we defined the threshold of SSEs as the 99-th percentile of the Poisson distribution with the rate at reproduction number (Additional file 1). Any transmission event that is seeded by a single infector would be counted as an SSE if the number of secondary cases exceeds the threshold. We thereafter calculated the probability of observing SSEs seeded by a single infector according to the SSE threshold. Subgroup analysis in different contact settings was also conducted in the same procedure to obtain the above estimates. 95% credible intervals (CrI) for each estimate were calculated as well. Technical details of the methodology can be found in Additional file 1.

Results

A total of 500 transmission pairs were constructed from the reported 28 transmission clusters. Of the settings where the identified transmission pairs occur, 31.1%, 25.6%, 28.7%, 4.0%, and 10.6% belonged to the community, household, health care facility, school and workplace, respectively. Among 1017 identified infectors, 75.0% of them led to no secondary cases, and 0.8% of them directly generated more than 10 cases. From the observed secondary case distribution and fitted negative binomial models, we estimated that the overall R and k were 0.561 (95% CrI: 0.496, 0.640) and 0.221 (95% CrI: 0.186, 0.262), respectively (Table 1).

Table 1 Summary of the estimated metrics of superspreading potentials under different contact settings

Figure 1 illustrated the joint estimates of reproduction numbers and dispersion parameters in different contact settings with 95% credible intervals. Based on the estimated R value, the threshold of SSEs was determined to be 6, and there were 17 out of 500 (3.4%) transmission events identified as SSEs. We inferred that 80% of total transmissions were generated by 13.14% (95% CrI: 11.55%, 14.87%) of the most infectious seed cases.

Fig. 1
figure 1

The Joint estimates of reproduction numbers and dispersion parameters of 6 settings (left) and three of them (right) specified of COVID-19. The two-dimension points are estimates of reproduction numbers and dispersion parameters. Vertical and horizonal lines indicate for each point are 95% credible intervals of reproduction numbers and dispersion parameters, respectively. The proportions of infector accounting for 80% of transmissions for each contact setting are indicated

Across all contact settings, the health care facility and household had a higher risk of transmission (larger value of R) whereas school, health care facility, and community had a higher superspreading potential (smaller value of k). The probability that an infector generates at least one secondary case was 24.37% (95% CrI: 21.47, 27.68). Furthermore, the probability of observing SSEs with a predefined threshold is 1.75% (95% CrI: 1.57, 1.99), and the probability that a seed case generates a transmission cluster with a size of 10 or greater is 3.87% (95% CrI: 2.94, 5.24). Other epidemiological results for mentioned contact settings are shown in Table 1.

Discussion

Characterizing the superspreading potential could provide a better understanding of the transmission potential of the COVID-19 pandemic and help to formulate targeted public health interventions. In this study, using transmission cluster data collected during the early phase of the epidemic in Japan, we assessed the superspreading potential of COVID-19 within different contact settings.

The effective reproduction number for each contact setting and the whole population are all less than 1. It’s compatible with the scenario that the pandemic from January to July 2020 in Japan has been controlled with valid interventions before the new wave of counterattack and variants of the virus. The reproduction number of transmissions among hospitals was relatively higher than others, and the dispersion parameters of hospitals and schools were small, consistent with the scenario that there were more vulnerable individuals or higher risk of contact of cases in hospitals, healthcare facilities and schools. It was also concluded in [10] that rare superspreading events in community resulted from infectors from hospitals, healthcare facilities or schools, whereas some cases in hospitals, healthcare facilities or schools were caused by the transmission chains originating from community superspreading events, which may lead to low dispersion parameter in the distribution of offspring from communities.

We found that the early epidemics in Japan exhibited a significant superspreading potential (k = 0.22), which is in line with another study conducted during a similar study period (k = 0.23) [15], but is smaller than an estimate obtained in Hong Kong (k = 0.43) [16]. This discrepancy could be attributed to the differences in imposed control policies. In Japan, cluster-based measures that focused on identifying and preventing transmission clusters were adopted to curb the epidemics [17]. On the other hand, a series of social distancing interventions, including school closure, work-from-home-policy, and cancellation of mass gatherings, were implemented in Hong Kong [18], which may have a greater effect on reducing the potential of societal SSEs [19] and thus resulting in a relatively higher k. It was also concluded in [10] that rare superspreading events in community resulted from infectors from hospitals, healthcare facilities or schools, whereas some cases in hospitals, healthcare facilities or schools were caused by the transmission chains originated from community superspreading events, which may lead to a low dispersion parameter in the distribution of offspring from communities. The selection of threshold of superspreading events also vacillates the assessment of superspreading potential [20], as we defined the threshold of SSE for the COVID-19 as the 99-th percentile of the Poisson distribution of the basic reproduction number (R0). Meanwhile, the super-aged society in Japan [10] can also be deemed as the underlying cause of the estimates in each setting.

We also found that the risk of transmission and superspreading potentials varied across different contact settings. The higher estimated superspreading potential in schools and communities is consistent with a study conducted in South Korea, whereby the transmission chains in communities and schools were more heterogeneous (smaller k) than that in the household [21]. Besides Hong Kong and South Korea, compared to other contact settings, relatively more significant superspreading potential occurred in communities in some other regions since there has been a high likelihood of community gathering due to religions and folk custom, such as Kumbh Mela during April and May in India and Songkran festival in Thailand [22, 23]. Furthermore, consistent with a part of our results, transmission among households in the UK performed higher secondary attack rates than those in communities, while relatively lower rates in larger households [24].

Limitations

This study has some limitations. Firstly, the transmission cluster data used was subjected to any bias (e.g., recall bias) generated during the contact tracing process and thus it is plausible that some cases that are exposed to the clusters were missed. This imperfect case ascertainment may lead to an underestimation of the R value but an overestimation of the k value [25]. Secondly, disproportional attention to infectors who generated infectees or not may have resulted in that infectors generating infectees were more likely to be collected and reported. Besides, the transmission clusters included in our study occurred during the early stage of the COVID-19 epidemics. Finally, more combinations of different types of contact settings can be considered when some places are interconnected through ventilation. Given that the current epidemics are dominated by the SARS-CoV-2 Omicron variants, further study is warranted to assess the superspreading potential of the emerging variants in Japan and other regions to help with formulating control policy.

Conclusion

In conclusion, the early COVID-19 epidemics in Japan demonstrated a significant potential of superspreading. Particularly, the school, health care facility and community had relatively higher potential of superspreading when compared to other contact settings. The different potential of superspreading in contact settings highlights the need to continuously monitor the transmissibility accompanied with the dispersion parameter, to timely identify high risk settings favoring the occurrence of SSE.