1 Introduction

Growing number of cases from an outbreak caused by novel coronavirus (COVID-19) have been found since December 2019, and has reached almost 180,000 confirmed infections as of 17, March 2020 [16], including 7426 deaths globally. It has been declared to be pandemic by WHO on March 11. The outbreak size has surpassed infections caused by two other major coronavirus to date, severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV). It appears the transmission rate of the COVID-19 is the highest in all circulating human CoVs [17], and the disease is still spreading at fast pace albeit Chinese authorities taking very stringent measures including city lock down with suspending all travel by air, train and highways. The lock down of Wuhan city, the epicentre of this outbreak in Hubei province, took place at 10 am local time on 23, January, 2020. The same strict interventions were used in several other cities in Hubei province and cities in other provinces with high count of secondary input cases. The transmission rate was likely accelerated by the national migration event ChunYun before lunar new year, with enormous traffic volumes from and to Wuhan and other major cities in China.

As many aspects of COVID-19 is not yet fully understood, it is important to evaluate its transmissibility during the initial phase of outbreak. Epidemiological reviews, case reports and contact risk histories available provide crucial information for establishing dynamic transmission models to evaluate the basic reproduction number (\(R_0\)) of COVID-19. \(R_0\) is defined as the average number of secondary cases in an uninfected population that an diseased person would infect, during the persons’ whole infectious window. It is used as a measure of transmission rate for infectious diseases, and containment of an outbreak requires to reduce effective \(R_0\) to below 1. In this paper, we fitted a Poisson transmission model with either symptomatic infection only or potential asymptomatic infection assumptions, together with current knowledge of the disease’s natural history of COVID-19, to the case symptom onset time reports available from China CDC website [2] as well as report delay time augmented onset report from daily confirmed case ascertainment available at World Health Organization (http://who.int) [16] and National Health Commission of the People’s Republic of China website (http://www.nhc.gov.cn/).

Researchers from several institutions had reported initial modelling results of the early-phase outbreak of COVID-19. Read et al. [13] and Wu et al. [17] relied on traffic data across Chinese cities and international flights incorporated in deterministic SEIR transmission model of infection. Disease natural history was based on early hypothesis of the COVID-19 as well as that of SARS and MERS. Li et al. [9] provided detailed disease onset record of the first 425 confirmed cases and estimates of disease incubation period and serial interval from the contact tracing, and applied these knowledge into renewal equations epidemic model to estimate \(R_0\). Zhao et al. [18] applied an exponential growth model to fit report rate-adjusted lab-confirmed case report data from Chinese authorities. The estimated \(R_0\)’s were generally above 2 in all these modelling work, however, we have noticed estimates with value as high as 3.8 [13] and 3.58 [18]. The wide range of estimated \(R_0\) is likely caused by insufficient high-quality data available and over-simplified model assumptions. We expect the estimates from various approaches to become relatively close with better understanding of the emerging disease and availability of incoming high-quality outbreak data.

The paper is organized as follows: Sect. 2 describes the data structure, assumption of COVID-19 natural disease history and likelihood-based Poisson model for evaluating transmissibility. Section 3 explains the data source for analyses, the obtained estimates of \(R_0\) and impact of different model assumptions using sensitivity analyses. Section 4 applies our models to updated onset data following the first submission. Section 5 summarizes the findings from the transmission model, discussed advantages and drawbacks of the current approach, and provided directions for potential model expansion as the outbreak progresses.

2 The Statistical Transmission Model

Consider the outbreak observation was made on a daily basis with integer day t comprised of the continuous time \((t-1,t]\). Assume the available outbreak data were from day 1 and ended at day T. For each infected individual i, \(i=1,\ldots ,N\), let \({\tilde{t}}_i\) be the symptom onset time, and the corresponding infection time \({\hat{t}}_i\) that leads to the onset, where \({\hat{t}}_i < {\tilde{t}}_i\). The case confirmation time \({\bar{t}}_i\) is greater than \({\tilde{t}}_i\) due to several reasons, such as the lack of understanding during the emerging phase of novel disease outbreak and lab processing and reporting time delay.

Fig. 1
figure 1

Illustrative examples for infection, symptom onset, hospitalization and virus lab confirmation time

We provide an hypothetical example of case event time review and contact tracing process to help readers understand the natural history of an infectious disease. The incubation period \(({\hat{t}},{\tilde{t}})\) is the time to develop symptoms after being infected. As illustrated in Fig. 1, the lengths of incubation period are 4, 4, 5, 4 and 3 days for cases 1 to 5. The accurate estimate of these event times from contact history evaluation is important for understanding the natural history of disease. For instance, if we know that there was close contact between cases 1, 2 and cases 4, 5, but case 3 had no contact with all other cases, a patient would be unlikely to transmit the disease before symptom onset. Otherwise, if cases 3 and 4 were in close contact and after hospitalization case 3 was isolated from case 4, it would indicate that the infected case may be infectious before showing any symptom. Such conclusions are, however, very challenging during emerging outbreak investigations, as \({\hat{t}}_i\)’s are often unavailable for most cases and the contact history between cases can be complex. On the other hand, there is usually delay between symptom onset time, hospitalization time and lab confirmation time. The incidence number of confirmed cases available as daily report usually consists of the lab confirmation time, thus is less accurate for recent infections.

2.1 Assumption of Disease Natural History

Exact \({\hat{t}}\) is usually difficult to determine for most infectious pathogens, necessitating the reasonable assumption of incubation period distribution. We assume the length incubation period of COVID-19 to be beta-distributed on \(({\text {Inc}}_{\min }, {\text {Inc}}_{\min } + \Delta _{\text {inc}})\), where \({\text {Inc}}_{\min }\) and \({\text {Inc}}_{\min } + \Delta _{\text {inc}}\) are the minimum and maximum lengths of the incubation period. The probability of observing symptom onset at t given infection at \({\hat{t}}\) is

$$\begin{aligned}&p_{{\text {inc}}}(t\mid {\hat{t}}) = f_{{\text {beta}}}\left( \frac{t - {\hat{t}} - {\text {Inc}}_{\min }}{\Delta _{{\text {inc}}}}\mid \alpha _{{\text {inc}}},\beta _{{\text {inc}}}\right) ,\\&\quad t \in ({\hat{t}} + {\text {Inc}}_{\min },{\hat{t}} + {\text {Inc}}_{\min } + \Delta _{{\text {inc}}}), \end{aligned}$$

where \(f_{{\text {beta}}}(\cdot |\alpha _{{\text {inc}}},\beta _{{\text {inc}}})\) is the density function of beta-distribution. The mean incubation length is \({\text {Inc}}_{\min } + \frac{\Delta _{{\text {inc}}} \alpha _{{\text {inc}}}}{\alpha _{{\text {inc}}}+\beta _{{\text {inc}}}}\). Density functions from other distributions such as log-normal [9], Weibull [5] or gamma distributions may be used as well, as long as these fit the observed incubation periods of COVID-19. We used beta-distribution since it has finite support that reflects the shortest/longest incubation period without the need to truncate the right tail for distribution with an infinite support.

We also assume that latent period (time length between infection and start of infectiousness) is the same as the incubation period, thus the case is only able to infect other people after symptom onset. The relative infectivity during the infectious period \(({\tilde{t}},{\tilde{t}} + \Delta _{{\text {inf}}})\) is assumed to be

$$\begin{aligned} p_{{\text {inf}}}(t\mid {\tilde{t}}) = f_{{\text {beta}}}\left( \frac{t - {\tilde{t}}}{\Delta _{{\text {inf}}}}\mid \alpha _{{\text {inf}}},\beta _{{\text {inf}}}\right) ,\quad t \in ({\tilde{t}},{\tilde{t}} + \Delta _{{\text {inf}}}). \end{aligned}$$

2.2 Poisson Transmission Model for Daily Disease Onset Record

Let \(\gamma\) be the average hazard of secondary infection along the infectious period. The overall infection hazard from all infected cases in the community is

$$\begin{aligned} {\hat{\lambda }}(t) = \sum _{i:t\in [{\tilde{t}}_i, {\tilde{t}}_i + \Delta _{{\text {inf}}}]} p_{{\text {inf}}}(t|{\tilde{t}}_i) \Delta _{{\text {inf}}} \gamma . \end{aligned}$$
(1)

With incubation period distribution \(p_{{\text {inc}}}(t|{\hat{t}})\), the intensity of disease onset at time t is thus

$$\begin{aligned} {\tilde{\lambda }}(t) = \int _{t - {\text {Inc}}_{\max }}^{t - {\text {Inc}}_{\min }} {\hat{\lambda }}(\tau ) p_{{\text {inc}}}(t|\tau ) {\mathrm{{d}}} \tau . \end{aligned}$$

Let \({\tilde{\varLambda }}(t) = \int _{s-1}^{s} {\tilde{\lambda }}(s) {\mathrm{{d}}} s\) be the daily cumulative intensity rate of onset. Assume the number observed symptom onsets for day t is Poisson distributed with intensity \({\tilde{\varLambda }}(t)\), the likelihood of observing onset record \(\tilde{{\mathbf {N}}} = ({\tilde{N}}_1,\ldots ,{\tilde{N}}_T)\) is

$$\begin{aligned} p(\tilde{{\mathbf {N}}}|\gamma , \theta _{{\text {inc}}}, \theta _{{\text {inf}}}) = \prod _{t=1}^{T} \frac{e^{-{\tilde{\varLambda }}(t)} {\tilde{\varLambda }}(t)^{{\tilde{N}}_t} }{ {\tilde{N}}_t ! }. \end{aligned}$$
(2)

\(R_0\) by definition is the expected number of infections caused by one case, thus can be estimated as \({\hat{R}}_0 = {\hat{\gamma }} \Delta _{{\text {inf}}}\) using the maximum likelihood estimate (MLE) for average hazard of secondary infection \({\hat{\gamma }}\). The likelihood approach is similar to [3], however, instead of making assumptions about the serial interval (time between onset of the index case and the secondary case), we differentiate the contribution of incubation period and relative infectivity in the transmission model.

2.3 Probable Infectiousness Before Symptom Onset

We previously assumed that the latent period is the same as the incubation period, however, with COVID-19 there had been reports of confirmed cases showing mild or even no symptoms who infected their family members. Thus, if we instead assume the infectiousness already develops since time of infection, the modified relative infectivity is \(p_{{\text {inf}}}^*(t|{\hat{t}}) = f_{{\text {beta}}}(\frac{t - {\hat{t}}}{\Delta _{{\text {inf}}}}|\alpha ^*_{{\text {inf}}},\beta ^*_{{\text {inf}}}) ,\quad t \in ({\hat{t}},{\hat{t}} + \Delta ^*_{{\text {inf}}})\). Consequently Eq. (1) is modified as follows:

$$\begin{aligned} {\hat{\lambda }}(t) = \sum _{i:t\in [{\hat{t}}_i, {\hat{t}}_i + \Delta ^*_{{\text {inf}}}]} p^*_{{\text {inf}}}(t|{\hat{t}}_i) \Delta ^*_{{\text {inf}}} \gamma . \end{aligned}$$
(3)

Thus the likelihood (2) must be calculated with both the onset record \(\tilde{{\mathbf {N}}}\) and the infection record \(\hat{{\mathbf {N}}} = ({\hat{N}}_1,\ldots ,{\hat{N}}_T)\). We may treat \(\hat{{\mathbf {N}}}\) as missing data that are linked to \(\tilde{{\mathbf {N}}}\) through \(p_{{\text {inc}}}(t|{\hat{t}})\). Given known \(p_{{\text {inc}}}(t|{\hat{t}})\) multiple imputation approach can produce MLE estimates for \(\gamma\) with repeated augmented \(\hat{{\mathbf {N}}}^{(k)}\), and summarizing over the \(k=1,\ldots ,K\) estimates from complete data \((\tilde{{\mathbf {N}}},\hat{{\mathbf {N}}}^{(k)})\). The model with the asymptomatic infection assumption is referred as modified Poisson transmission model.

2.4 City Lock Down Intervention

We assume the lock down intervention had an immediate effect to reduce the disease transmissibility, plus another lasting effect that strengthens following the lock down. The intervention effect can be modelled as

$$\begin{aligned} \theta (t) = \left( \theta _1 e^{ - \theta _2 (t-t_0)} \right) ^{I(t > t_0)}, \end{aligned}$$
(4)

where \(t_0\) is the start time of intervention, \(\theta _1 \in [0,1]\) is the immediate infectivity reduction and \(\theta _2>0\) is the lasting effect rate. To incorporate the intervention effect in the previous transmission model Eq. (1) will be modified as

$$\begin{aligned} {\hat{\lambda }}(t) = \sum _{i:t\in [{\tilde{t}}_i, {\tilde{t}}_i + \Delta _{{\text {inf}}}]} p_{{\text {inf}}}(t|{\tilde{t}}_i) \Delta _{{\text {inf}}} \gamma \theta (t). \end{aligned}$$
(5)

The intervention parameters \(\theta _1,\theta _2\) can be estimated using MLE and their Wald confidence intervals can be obtained via inverting the observed Fisher information from the likelihood.

3 Analysis of the Outbreak’s Early Phase

We fitted the proposed transmission models to infer the basic reproduction number of COVID-19 in China (including Hong Kong, Macao and Taiwan) using the disease onset sizes recorded daily fro1, December 2019 to 23, January 2020. Sources of COVID-19 onset data were (1) confirmed and suspected daily onset record from China CDC Epidemic update and risk assessment of 2019 Novel Coronavirus [2]; (2) augmented daily onset record using confirmed cases reports from National Health Commission of the People’s Republic of China using confirmation delay distribution [5]. We restricted case onset data until 23 January 2020 to be fitted, due to the following considerations: (a) the onset record after the cutoff is likely inaccurate with the delayed confirmation process at the manuscript preparation time; (b) insufficient evidence for transmission reduction effect with the strict lock down quarantine interventions in Greater Wuhan Region implemented since 23–24, January 2020.

Several preliminary estimates of disease natural history and \(R_0\) were available for the COVID-19 outbreak [9, 10, 13, 17, 18]. We assumed \(\alpha _{{\text {inc}}} = 6\) and \(\beta _{{\text {inc}}} = 12\) for the beta-distributed incubation period, with a range of 1 to 14 days. The assumed mean incubation is 5.67 days which is similar to the previous literature on SARS and MERS incubation period [7, 8, 15]. The relative infectivity is assumed to be varying over the infectious period (assumed to be 0–10 days after onset) since the patients may reduce their mobility and contact with others with alarming symptoms such as fever, fatigue and coughing. We performed sensitivity analyses by varying the distribution of incubation period and relative infectivity over time. These assumptions are summarized in Fig. 2, including shorter or longer mean incubation period as well as different time after symptom where the relative infectivity is the highest. Evaluation of model fit was performed by comparing observed and fitted values of onset records.

Fig. 2
figure 2

Assumptions used to model the incubation period length distribution and relative infectivity following patient symptom onset for COVID-19 outbreak in China from 1, December 2019 to 23 January 2020. No asymptomatic infectiousness was possible in this model as cases may infect others only after symptom onset

The different parameter assumptions for the modified transmission model where cases may show infectiousness before symptom onset include \(\Delta ^*_{{\text {inf}}} = 24\) days after \({\hat{t}}\), and the relative infectivity was assumed to be the highest at or after symptom onset. The incubation period assumptions were the same as the original model. Figure 3 summarizes the assumed incubation and relative infectivity distribution for the modified model.

Fig. 3
figure 3

Modified assumptions used to model the incubation period length distribution and relative infectivity following infection time for COVID-19 outbreak in China from 1, December 2019 to 23, January 2020. Asymptomatic infectiousness exist as the infectious period starts at time of infection

3.1 Data Processing

The daily onset record from “China CDC Epidemic update and risk assessment of 2019 Novel Coronavirus” is publicly available in graphical form [2]. The daily reports of both confirmed and suspected cases were plotted. We used Engauge Digitizer [11] to export the numeric onset record for MLE estimation. To evaluate the impact of potential error during the data exporting process, sensitivity analyses were performed by adding error counts with random generated integers between \(-\,3\) and 3 with equal probability to the daily number of cases after 1, January 2020. The onset counts in 2019 were available in [9] and should be accurate at the time of our modelling effort.

The second source of daily onset record is to augment onset time \({\tilde{t}}\) from case report confirmation time \({\bar{t}}\) using the confirmation delay distribution. We used a publicly available toolkit in GitHub [5] to generate bootstrap samples of augmented onset record. The delay distributions between case onset times and case ascertainment times were estimated using geometric distribution fitted to “Kudos line list data” [14]. Sensitivity analyses were performed to account for the variability of augmented onset records.

3.2 Results

With the baseline assumptions shown in black curves in Fig. 2, the estimated \(R_0\) and its \(95\%\) confidence interval was 2.47(2.39, 2.55) from confirmed cases only, or 2.54(2.49, 2.60) from both confirmed and suspected cases onset records in [2]. We found little impact of the error counts on the estimated model parameters, thus we illustrate our results assuming the error counts were all 0. Assuming no error the reported number of confirmed onset cases was 3442 by 23, January 2020, and number of suspected plus confirmed cases were 8348. Estimated \(R_0\) was 2.31(2.25, 2.38) using confirmation report augmented onset records, from an average of 4940 augmented confirmed cases. Using the repeatedly augmented onset records yielded similar estimates of \(R_0\) and the variance estimate for the 95% CI was obtained by combining the estimates from the repeatedly augmented data, accounting for multiple augmentation variability. Figure 4 shows the time trend of number in observed onset cases, fitted onset cases as well as fitted infected cases each day for all three data sources, respectively. The fitted onset trend matches the observed data well, indicating the assumed parameters for incubation and relative infectivity were reasonable.

Fig. 4
figure 4

Observed (shown as bars) and fitted (red curve) number of cases with symptom onset over time, and fitted number of infected cases (blue curve) over time. The upper panels show the results from the model that assumed the incubation period and latent period are the same. The lower panels show the results from the modified model where infectiousness can develop before symptom onset

We summarize the \(R_0\) estimates from sensitivity analyses in Table 1. Under these scenarios, the estimated values of \(R_0\) fall between 2 and 3, which were in general agreement with modelling literature [9, 10, 17, 18] and China CDC report [2], despite various data sources and modelling approaches. The estimated \(R_0\) tended to be higher with longer incubation period or with the later development of infectivity peak compared to the baseline assumptions. Shorter incubation period or earlier development of infectivity peak led to lower estimated \(R_0\), while distribution assumptions with the opposite direction resulted in higher estimated \(R_0\). The fitted versus observed number of daily onset cases were in good agreement in all scenarios for sensitivity analyses, as the assumptions for either incubation period or relative infectivity in these scenarios were generally close to these in the baseline scenario. The transmission model would be expected to have poor fit to the data if the disease natural history assumptions were very different from the truth.

Table 1 Estimated \(R_0\) and its 95% CI for early outbreak phase of 2019 nCOV outbreak in China

The \(R_0\) estimates from the modified Poisson transmission model were 2.88(2.78, 2.95) from the confirmed cases only, and 2.97(2.90, 3.04) from both confirmed and suspected cases [2]. With augmented onset records, \(R_0\) was estimated to be 2.69(2.61, 2.77). The fitted versus observed symptom onset records were also plotted in Fig. 4. Interestingly, we noticed similarly goodness-of-fit performance from the modified model and that from the original model, but the estimates for \(R_0\) is about 17% higher in the modified transmission model. The difference mainly consisted of the additional infectiousness in the incubation period compared to the original model.

The sensitivity analyses for the modified transmission model are shown in Table 1. The varying incubation period distribution had very little impact on \(R_0\) estimates this time. On the other hand, different positions of relative infectivity peak after infection affected \(R_0\) estimates a lot. Under the scenario where the relative infectivity reached its peak about 9-10 days after infection, estimated \(R_0\) can be very high at 3.5-4.0.

4 Results with Updated Case Onsets in China

Updated data from China CDC became available [12] during the review time of our first submission. The updated data include case symptom onset times until 11, February 2020. Compared to previous data, the updated data include more than 2 week’s onset records following lock down of the greater Wuhan Region implemented since January 23–24. We could then estimate the intervention effect parameters with the updated transmission model. Recently published median incubation period in China [4] was estimated to be 4 days with interquartile range of 2–7 days. The shorter incubation period distribution is most close to these findings. Published viral load discoveries in [19] indicated that the levels were highest at or a few days following symptom onset. Observing that the relative infectivity is highly associated to viral load, we felt baseline or earlier relative infectivity assumptions might be more appropriate reflecting recent findings. Thus, we used shorter incubation assumptions and either baseline or earlier infectivity assumptions in the transmission models to fit the updated onset data.

Fig. 5
figure 5

Observed (shown as bars) and fitted (red curve) number of cases with symptom onset over time, and fitted number of infected cases (blue curve) over time from updated data [12]. Results were with shorter incubation period and baseline infectivity assumptions. The purple vertical dotted line indicates the start of lock down intervention

One reviewer suggested separating cases inside or outside Wuhan as regional differences may exist in the transmission. This is possible with the updated data thus we fitted models to two data sets separately reflecting the place of onset. However, we felt that the current model assumptions were not well suited to describe the transmission outside Wuhan. About 68.6% cases outside Wuhan had Wuhan related exposure history and bring the disease to other regions in China [12]. These cases should be treated as input cases in the model and more detailed data are required for further analysis. Thus, we only included results with cases within Wuhan in the main text. The results with cases outside Wuhan region can be found in the supplementary material.

Figure 5 shows the model fit to the observed onset curves from 8, December 2019 to 11, February 2020 inside Wuhan Region. Similar to previous results, fitted onset curves matched observed onset curves relatively well. The intervention effect was clearly shown as the daily onset cases started to decrease after 4 to 5 days following the lock down intervention shown as vertical purple dotted line at 23, January 2020.

Table 2 Estimated model parameters and their 95% CIs for COVID-19 outbreak in Wuhan with updated data [12]

Fitted model parameters are summarized in Table 2. Interestingly, the estimated \(R_0\)’s with updated data tend to be higher than those from previous results, in the range of 2.7 to 4.2. As the new results were from more recent data and more solid disease natural history assumptions, we believe these estimates should be more convincing. The intervention effect of lock down took place gradually and we estimated after an average of 3.7-5.3 days, the effective reproduction number \(R_{eff} = R_0 \theta (t)\) fall below 1, when the outbreak became under control.

5 Discussion

Obtaining timely understanding of the transmission of emerging infectious disease is crucial for evaluating the damage and size from the outbreak, and to provide important information for informed intervention strategies to contain the outbreak. We have fitted a relatively simple transmission model requiring basic knowledge of disease natural history and daily records of symptom onset to estimate \(R_0\). Our inferred \(R_0\) fall in the same range as results from other researchers [9, 10, 17, 18], who performed modelling work using different sources of outbreak data. Our model made the assumption that the outbreak took place in a homogeneous population with random mixing of individuals, which reflects the fact that the majority of the early phase of COVID-19 outbreak occurred in the greater Wuhan area, and the input cases in other cities in China and internationally account for a small proportion of all infected by 23, January 2020. We did not rely on the traffic volume data in the current approach, however, these could be modelled as additional factors that affect the hazard contribution from geographical covariates when available. The inferred transmissibility were similar from epidemiological report when using China CDC and the augmented onset record from case confirmation time with potential report delay. Ignoring the report delay may lead to substantial bias for estimating \(R_0\), as the exponential growth rate of cumulative cases could be a lot higher than reality. We considered potential asymptomatic infectiousness in the modified model. As more recent evidences indicating potential infection without symptom exists [1, 6], we want to emphasis that these modified assumptions may be crucial to understand not only the disease transmissibility, but to a greater extent in future model to evaluate intervention strategies with sensitive timing components.

Nevertheless, there were challenges to the current approach, mainly due to the limited understanding of the emerging COVID-19 outbreak and the lack of full, accurate data. First, the \(R_0\) estimates were sensitive to different assumptions of disease natural history. We rely on accurate quantification from contact history of the current outbreak and knowledge of previous coronavirus outbreaks such as SARS and MERS to formulate reasonable assumptions. Also, whether asymptomatic infection exists for COVID-19 is still unclear yet. From the comparison between model that assumed no asymptomatic infection and the modified one assuming possible infectiousness before symptom onset, the \(R_0\) estimates were about 0.5 higher in the latter results. Updated case reports that indicate potential asymptomatic infection of COVID-19 would be important in making the sensible model assumption. If there is asymptomatic transmission for COVID-19 outbreak, much more stringent intervention efforts such as quarantining all individuals with potential exposure contacts are required, as compared to quarantining only these with exact contact history with confirmed cases. On the other hand, currently reported cases from China CDC may merely be a reflection of the limited resources of available medical services including hospital beds, health care work forces, medical protection equipment and virus RNA testing capability. There could be more cases of infection and death than reported, acting as one of the biggest challenges for transmission modelling. The limited resources not only caused potential underestimated number of cases, but also could lead to additional distortion in the distribution of report delay and relative infectivity.

As the outbreak is continually expanding to more regions within China and internationally, the homogeneous spreading assumption must be adapted to account for the heterogeneity of transmission conditions in different time and space. Individual level covariates should be incorporated into transmission dynamics, and time-varying effect as well as spatial parameters to modify the local intensity of infection can be estimated when richer data from the outbreak become available. It is especially critical to evaluate the efficacy of traffic ban of the greater Wuhan region and future interventions including quarantine sites or promising medication used to reduce the effective reproductive number of COVID-19. We have included lock down effects in our model with updated data, and it was shown to effectively control the outbreak within 4 to 5 days following the intervention, albeit at massive economical and social costs. It is very unlikely other countries could copy the exact draconian interventions happened in China. It remains utterly important for careful evaluation and implementation of various strategies to control the outbreak.