1 Introduction

Since December 2019, Coronavirus Disease 2019 (COVID-19) spreads all provinces in China [1]. In March 2020, China government got the epidemic of COVID-19 under control by implementing the first-level response to public health emergencies (FLRPHE) and strict control measures [2]. However, the COVID-19 spreads rapidly all over the world, and 100,221,840 confirmed cases and 2,154,967 deaths were reported as of January 27, 2021.

Table 1 The important events related to the outbreak of COVID-19 in Harbin

An outbreak of COVID-19 in Harbin was caused by an imported case, the first related case was diagnosed and two asymptomatic cases were found on April 9 [3,4,5]. As of April 30, 68 confirmed cases and 23 asymptomatic cases were reported from the COVID-19 outbreak in Harbin. On March 19, an American student (H) carrying COVID-19 came back to Harbin from New York and then infected her neighbor (C) during the confinement period. Due to the infection within the family, G was infected with COVID-19 and then infected C by the dine together. Subsequently, C with cerebral apoplexy sought medical advice in the second hospital in Harbin and the first affiliated hospital of Harbin medical university, which caused that many patients, health care providers, doctors and nurses were infected with COVID-19 and then led to the local outbreak of COVID-19 in Harbin. The important events related to the outbreak of COVID-19 in Harbin are shown in Table 1.

Mathematical modeling is used usually to explore the transmission of diseases such as COVID-19 and also predict the development trend of disease with the help of previous related information [6,7,8,9,10,11]. According to the confirmed cases outside of mainland China by 18th January, Imai et al. [12] inferred the epidemic size of COVID-19 in Wuhan. Using public information and mathematical models, Wu et al. [13] estimated the clinical severity of COVID-19. Tang et al. [14] evaluated the epidemic size in China, and the effect of control measures was assessed in China. With the help of the mathematical model and public information, Song et al. [9] estimated the epidemic size of COVID-19 in China and predicted the potential second epidemic in China. Subsequently, Song et al. [11] established a mathematical model, computed the basic reproduction number, and estimated the epidemic size of COVID-19 in Wuhan as of January 23, 2020. In addition, the bilinear neural network method can be used to study the differential equations [15, 16]. The transmission of COVID-19 in China has been largely studied [9, 11,12,13,14], but there is no related study on mathematical modeling of COVID-19 transmission in Harbin by now.

To study the spread of COVID-19 in Harbin, the basic reproduction number and the effective reproduction number were computed. The basic reproduction number is defined as the expected number of secondary cases produced by a single infection in a completely susceptible population [17, 18]. While the effective reproduction number is the mean number of secondary cases an infected person can cause in a population where there is some immunity or some interventions in place [19]. It is useful to compute the basic reproduction number to estimated the transmission capacity of COVID-19 when the outbreak occurs. However, as a result of interventions such as FLRPHE and isolation measures, the effective reproduction number changes in time, and then, it is necessary to investigate the effective reproduction number to combat COVID-19 [20]. Most studies investigated the spread of COVID-19 using susceptible-infectious-recovered (SIR) or susceptible-exposed-infectious-recovered (SEIR) model [13, 14, 21,22,23,24,25,26,27,28]. For example, Liu et al. [26] predicted the cumulative number of reported cases to a final size using the SIR model. Wu et al. [21] used the SEIR model to forecast the potential and international spread of COVID-19 in Wuhan, where they assumed that the infected people in the incubation period were not infectious. Here, we established a susceptible-unfound infected-found infected-removed (SIFR) model, where the unfound infected people include infected people in the incubation period and unfound asymptomatic and symptomatic infected people.

To estimate the epidemic size of COVID-19 and assess the effectiveness of interventions in Harbin, a mathematical model explaining the transmission dynamics of COVID-19 is established. The basic reproduction number and the effective reproduction number are computed, and dynamics are analyzed by rigorous mathematical analysis. Using the mathematical model and public information, the epidemic size of COVID-19 in Harbin is estimated and the effect of interventions on the transmission of COVID-19 in Harbin is evaluated.

The rest of the paper is organized as follows. In Sect. 2, the mathematical model is established and dynamics are analyzed. Section 3 gives the parameter estimation. The estimation of the epidemic size of COVID-19 in Harbin is given and the effectiveness of interventions is assessed. Section 5 presents the discussion and conclusion.

2 Mathematical modeling of COVID-19 transmission in Harbin and dynamic analysis

Based on the found infected cases in Harbin in April 2020, a mathematical model is used to compute the basic reproduction number, estimate the epidemic size of COVID-19 in Harbin, and assess the effect of interventions on the transmission of COVID-19 in Harbin.

The report of the WHO-China Joint Mission on COVID-19 [29] shows that the infected people in the incubation period have infectivity. In the model, several assumptions are given in the following. The natural birth rate and death rate are not considered since the period of the epidemic is short. The unfound infected people with COVID-19 have the same infectivity with infectious people with COVID-19 and found infected people are quarantined and could not infect healthy people. Removed people are not infected with COVID-19 again. The unfound infected people include infected people in the incubation period and unfound asymptomatic and symptomatic infected people.

The population (N) is divided into susceptible people (S), unfound infected people (I), found infected people (F) and removed people (R), respectively. Here, \(N(t)=S(t)+I(t)+F(t)+R(t)\). Susceptible people become infected people by the transmission rate \(\beta \) after contact with unfound infected people. Infected people are found out by the found rate p. Infected people are removed by the removed rates \(\gamma _1\) and \(\gamma _2\), respectively. Here, the removed rate includes recovery rate and death rate due to disease. The flow diagram is shown in Fig. 1.

Fig. 1
figure 1

The transmission diagram of COVID-19 in Harbin. Where S, I, F and R represent the susceptible people, unfound infected people, found infected people and removed people

Based on the epidemiological patterns of COVID-19 in Harbin and previous work [21, 28, 30], the transmission dynamics of COVID-19 in Harbin is presented by the following differential equations

$$\begin{aligned} \left\{ \begin{aligned} \displaystyle \frac{\mathrm {d}S(t)}{\mathrm {d}t}=&-\frac{\beta S(t) I(t)}{S(t)+I(t)+F(t)+R(t)},\\ \displaystyle \frac{\mathrm {d}I(t)}{\mathrm {d}t}=&\frac{\beta S(t) I(t)}{S(t)+I(t)+F(t)+R(t)}\\ \displaystyle&-\gamma _1 I(t)-p I(t),\\ \displaystyle \frac{\mathrm {d}F(t)}{\mathrm {d}t}=&p I(t) -\gamma _2 F(t),\\ \displaystyle \frac{\mathrm {d}R(t)}{\mathrm {d}t}=&\gamma _1 I(t)+\gamma _2 F(t), \end{aligned} \right. \end{aligned}$$
(1)

where the nonnegative initial values

$$\begin{aligned} \begin{aligned}&S(0)>0, I(0)\ge 0, F(0)\ge 0 \\&\mathrm{and} \ \ R(0)\ge 0. \end{aligned} \end{aligned}$$
(2)

Theorem 2.1

For system (1) with initial nonnegative initial values (2), the solutions of system (1) are nonnegative and ultimately bounded.

Proof

Through the Theorem 5.2.1 in [31], the solutions S(t), I(t), F(t) and R(t) are nonnegative instantly.

From the first equation of model (1), we obtain

$$\begin{aligned} \frac{\mathrm {d}S_{1}(t)}{\mathrm {d}t} \le 0. \end{aligned}$$

Thus, \(\limsup _{t\rightarrow +\infty }S_1(t)\le S_1(0)\), which means that S(t) is ultimately bounded.

From the equations of model (1), we have

$$\begin{aligned} \frac{\mathrm {d}N(t)}{\mathrm {d}t}= 0. \end{aligned}$$

Then \(N(t)=N_0\), which means that I(t), F(t) and R(t) ultimately bounded. The proof is completed. \(\square \)

For system (1), the disease-free equilibrium \(E_0=(S_0, 0, 0, R_0)\) with \(S_0>0\) and \(R_0>0\). With the help of the next generation matrix theory [17, 18], the basic reproduction number is calculated, where

$$\begin{aligned} R_0= \frac{\beta }{\gamma _1+p}. \end{aligned}$$

Then the effective reproduction number \(R_e(t)\) is

$$\begin{aligned} R_e(t)= \frac{\beta S(t)}{(\gamma _1+p)(S(t)+I(t)+F(t)+R(t))}. \end{aligned}$$

In what follows, the stability of the disease-free equilibrium \(E_0\) is proved.

Theorem 2.2

For system (1),

  1. (i)

    The disease-free equilibrium \(E_0\) is locally asymptotically stable if \(R_0<1\).

  2. (ii)

    If \(R_0<1\), then the disease-free equilibrium \(E_0\) is globally asymptotically stable.

Proof

Since the stability of S(t) and R(t) is determined by the stability of I(t) and F(t), and \(N(t)=S(t)+I(t)+F(t)+R(t)\), then system (1) is simplified into the following equations

$$\begin{aligned} \left\{ \begin{aligned} \displaystyle \frac{\mathrm {d}I(t)}{\mathrm {d}t}=&\frac{\beta S(t) I(t)}{S(t)+I(t)+F(t)+R(t)}\\&-\gamma _1 I(t)-p I(t),\\ \displaystyle \frac{\mathrm {d}F(t)}{\mathrm {d}t}=&p I(t) -\gamma _2 F(t). \end{aligned} \right. \end{aligned}$$
(3)

The Jacobian matrix of system (3) at \(E_0\) is obtained, where

$$\begin{aligned} {J}= \begin{pmatrix} \beta -\gamma _1-p &{} 0 \\[1ex] p &{} -\gamma _2 \\[1ex] \end{pmatrix}. \end{aligned}$$

Thus, the characteristic equation at \(E_0\) is

$$\begin{aligned} (\lambda -\beta +\gamma _1+p)(\lambda +\gamma _2)=0. \end{aligned}$$

If \(R_0<1\), then the roots \(\beta -\gamma _1-p<0\) and \(-\gamma _2<0\). Therefore, the disease-free equilibrium \(E_0\) is locally asymptotically stable.

Now we prove the global stability of the disease-free equilibrium \(E_0\). Define the Lyapunov function

$$\begin{aligned} L(t)=I(t). \end{aligned}$$

Apparently, \(L(t)\ge 0 (\forall t\ge 0)\). Differentiating L(t) along the solutions of system (3) yields

$$\begin{aligned} \begin{aligned} \frac{\mathrm {d}L(t)}{\mathrm {d}t}&\le \beta I(t)-\gamma _1 I(t)-p I(t)\\&=\frac{I(t)}{\gamma _1+p} (R_0-1). \end{aligned} \end{aligned}$$

When \(R_0<1\), \(L'(t)\le 0\). Furthermore, the largest compact invariant set in \(\{L'(t)= 0\}\) is the singleton \(\{E_0\}\). Using the LaSalle invariance principle [32] and the local stability of \(E_0\), the disease-free equilibrium point \(E_0\) is globally asymptotically stable. The proof is completed. \(\square \)

In the following, it follows from [33] that the epidemic size and peak value of COVID-19 in Harbin could be obtained.

The sum of the first and second equations of system (1) is

$$\begin{aligned} (S+I)'=-(\gamma _1+p)I, \end{aligned}$$
(4)

which means that \(I_{\infty }=\lim _{t\rightarrow \infty }I(t)=0\) and \(\lim _{t\rightarrow \infty }(S(t)+I(t))=S_{\infty }\). Integration of the equation (4) from 0 to \(\infty \) leads to

$$\begin{aligned} (\gamma _1+p)\int _0^{\infty }(S(t)+I(t))\mathrm {d}t= S_0+I_0-S_{\infty }. \end{aligned}$$

From the first equation of model (3), we have

$$\begin{aligned} \begin{aligned} \ln \frac{S_0}{S_{\infty }}&= \frac{\beta }{N_0} \int _0^{\infty }I(t)\mathrm {d}t\\&=\frac{\beta }{N_0(\gamma _1+p)} (S_0+I_0-S_{\infty })\\&=R_0 \frac{S_0+I_0-S_{\infty }}{N_0}. \end{aligned} \end{aligned}$$
(5)

It follows from equation (5) that the relation between the basic reproduction number and epidemic size of the COVID-19 is given.

Furthermore, integration of the first equation of model (3) from 0 to t yields

$$\begin{aligned} \begin{aligned} \ln \frac{S_0}{S(t)}&= \frac{\beta }{N_0} \int _0^t I(t)\mathrm {d}t\\&=\frac{\beta }{N_0(\gamma _1+p)} [S_0+I_0-S(t)-I(t)],\\ \end{aligned} \end{aligned}$$

and then

$$\begin{aligned} \begin{aligned}&S(t)+I(t)-\frac{N_0(\gamma _1+p)}{\beta }\ln S(t)\\&=S_0+I_0-\frac{N_0(\gamma _1+p)}{\beta }\ln S_0. \end{aligned} \end{aligned}$$

When the derivative of I is zero (that is, \(S=\frac{N_0(\gamma _1+p)}{\beta }\)), we obtain the maximum number of infectives which is

$$\begin{aligned} \begin{aligned} I_{\max }&=S_0+I_0-\frac{N_0(\gamma _1+p)}{\beta }\ln S_0\\&\quad -\frac{N_0(\gamma _1+p)}{\beta }+\frac{N_0(\gamma _1+p)}{\beta }\ln \frac{N_0(\gamma _1+p)}{\beta }. \end{aligned} \end{aligned}$$
(6)

Simulations are carried out to verify the theoretical results. Setting the parameters \(\beta =0.1403\), \(p=0.1128\), \(\gamma _1=0.1\), \(\gamma _2=0.1\) and the initial value \((S(0), I(0), F(0), R(0))=(25, 11, 4, 0)\). Figure 2 shows that the disease-free equilibrium \(E_0\) is stable when \(R_0<1\). Susceptible and removed people approach to \(S_0\) and \(R_0\), respectively.

Fig. 2
figure 2

The solution behavior of the model (1). When \(R_0<1\), the disease-free equilibrium \(E_0\) is stable

Setting the parameters \(\beta =0.6403\), \(p=0.1128\), \(\gamma _1=0.1\), \(\gamma _2=0.1\) and the initial value \((S(0), I(0), F(0), R(0))=(173, 1, 0, 0)\). Figure 3 gives the relation between S and I which describes the orbits of the solutions of the model (1) in the (SI) plane. The maximum number of infectives is obtained when the derivative of I is zero.

Fig. 3
figure 3

The relation between S and I describes the orbits of the solutions of the model (1) in the (SI) plane

Table 2 Related parameters and initial values in Harbin

3 Parameters estimation

3.1 Data source

Data on found infected cases of COVID-19 from April 9 to April 30, 2020, in Harbin were obtained from the Health Commission of Heilongjiang Province [3]. The data set includes the cumulative number of found infected cases, newfound infected cases and cured cases. These data used are from publicly available data sources.

3.2 Parameters estimation

As of April 30, 68 confirmed cases and 23 asymptomatic cases were reported from the COVID-19 outbreak in Harbin [3]. According to the study in [34], the average time of treatment is 10 days (\(\gamma _1=0.1\) and \(\gamma _2=0.1\)). As of April 9, one confirmed case and three asymptomatic cases were reported. Then the initial values on April 9 are \(F(0)=4, R(0)=0\).

The initial values for two state variables and two unknown parameters were estimated using Bayesian methods. Multivariate Gaussian was chosen as the prior distribution of two unknown parameters. We chose the mean of the posterior distribution as the estimated value of the initial values and parameters, which were estimated by the Markov Chain Monte Carlo (MCMC) method. In view of the mathematical model and confirmed cases, using the MCMC method employing the adaptive Metropolis–Hasting algorithm with 20000 iterations and a 10000 iteration burn-in period [35], parameter values \(\beta \), p and initial values S(0), I(0) are estimated. In addition, the mean value, standard deviation (STD) and 95% confidence interval (95% CI) are given in Table 2.

3.3 Fitting results

Concerning the uncertainty of estimated parameters and initial values, the MCMC method is used to evaluate the performance of our model (1) by the estimated parameters and initial values in Table 2. Figure 4 shows the estimated cumulative infected cases and read data of COVID-19 in Harbin. Simulations are consistent with the reported cases, which validates the accuracy of our model.

Fig. 4
figure 4

The estimated cumulative infectious cases of COVID-19 in Harbin

4 The spread of COVID-19 and the effect of interventions in Harbin

4.1 Estimating the spread of COVID-19 in Harbin

Simulation results of the estimated cumulative infected cases are shown in Fig. 4. Then we compute the basic reproduction number of 3.6 on April 9, 2020. Figure 4 shows that the cumulative number of infected people reached 174, the cumulative number of found infected people was 94 and the cumulative number of unfound infected people was 80.

Fig. 5
figure 5

The effective reproduction number \(R_e(t)\) in Harbin

4.2 The effect of interventions on the spread of COVID-19 in Harbin

The effective reproduction number \(R_e(t)\) for our model is shown in Fig. 5. On April 9, the effective reproduction number was 3.6 which was the maximum of \(R_e(t)\). As time went on, \(R_e(t)\) decreased quickly and was less than the threshold value 1 after April 15. Subsequently, \(R_e(t)\) arrived at the minimum of 0.04 on April 30 which immensely below the threshold value 1. The outbreak of COVID-19 is under control when the effective reproduction number \(R_e(t)\) is less than 1. This implies that the outbreak of COVID-19 in Harbin in April 2020 was under control when effective interventions were implemented.

5 Discussion and conclusion

Since the first case of COVID-19 was reported in Harbin on April 9, 2020, COVID-19 caused 68 confirmed cases and 23 asymptomatic cases in Harbin from March to April 2020. In the paper, using the public information and our mathematical model, we estimated the COVID-19 outbreak size in Harbin in April 2020. Results show that the cumulative number of infected people reached 174, the cumulative number of found infected people was 94 and the cumulative number of unfound infected people was 80. To assess the effectiveness of interventions in Harbin, the effective reproduction number was estimated based on the public information and our mathematical model.

This is the first study to estimate the transmission potential of the COVID-19 outbreak in Harbin in April 2020. The cumulative number of infected people finally reached 174, where 54% of infected people were found and 46% of infected people were not found out. Although all close contacts tracked were detected, some infected people were not found out. Indeed, the unfound infected people might be the infected people in the incubation period, unfound asymptomatic and symptomatic infected people. Therefore, it is dangerous for public health to ignore unfound infected people. We must maintain vigilance against unfound infected people.

Our findings indicate that the effective reproduction number on April 9 reaches 3.6 which is consistent with the estimated value in China [14, 21, 30, 36]. Besides, the estimated basic reproduction number was 6.8 in Hubei province [34], and 3.6 in New York [37]. However, when the effective interventions were implemented by the Heilongjiang provincial government, the effective reproduction number \(R_e(t)\) drastically dropped and finally reached 0.04 which is greatly below the threshold value 1, which suggest that the outbreak of COVID-19 in Harbin in April 2020 was under control and no subsequent outbreak in Harbin.

The mathematical modeling used in the study is analogous to the transmission dynamics model of COVID-19 in [12, 13, 22,23,24,25,26,27,28, 38,39,40,41,42,43,44]. The SIFR model helps us to estimate the cumulative number of infected cases of COVID-19 in Harbin in April and the effective reproduction number using found infected cases. Nonetheless, there are several limitations. First, we assumed that infected people with COVID-19 in the incubation period have the same infectivity as infectious people with COVID-19, which caused that the outbreak size of COVID-19 in Harbin was overestimated. Second, the detailed interventions were not incorporated into our model, which might lead to overestimating the outbreak size of COVID-19 in Harbin in April 2020. Third, very little is known about the effect of temperature and precipitation on the transmission of COVID-19. Our estimation could be untrustworthy if temperature and precipitation have a strong impact on the transmission of COVID-19.

Although the transmission of COVID-19 has been under control in China, the epidemic situation of COVID-19 all over the world is serious. Now, an increasing number of imported infected cases got into China and a growing number of asymptomatic infected people were found, which might increase the risk of a local outbreak of COVID-19 in China. Therefore, we should stay alert in case that unfound infected people might cause local outbreaks of COVID-19 in China such as the outbreak of COVID-19 in Harbin in April 2020 [9, 45]. The resurgence of COVID-19 in Beijing was likely caused by a polluted environment-to-human transmission from food via cold chain logistics [46]. The polluted environment-to-human transmission from food via cold chain logistics brings new challenges and its effect on the transmission of COVID-19 would be assessed, which we leave as our future work. Also, the differences of spread of COVID-19 in different hospitals and more modeling and verification for different cities in China will bring challenges from analysis as well as regression, and we will leave this as a future work.