1 Introduction

Based on uncertainty theory (Liu 2007), Liu (2008) initialized uncertain differential equation as a type of differential equations involving uncertain processes. Under linear growth and Lipschitz condition, Chen and Liu (2010) proved an existence and uniqueness theorem of solution of uncertain differential equation. Following that, Gao (2012) proved the theorem again under local linear growth and Lipschitz condition. Furthermore, an analytic solution to linear uncertain differential equations was derived by Chen and Liu (2010), and some analytic methods to nonlinear uncertain differential equations were presented by Liu (2012) and Yao (2013b). Yao and Chen (2013) made an important contribution for verifying that the solution of an uncertain differential equation can be represented by a family of solutions of ordinary differential equations (this important work was named as Yao–Chen Formula later), and then the methods for calculating extreme value, first hitting time and time integral of the solution of uncertain differential equation were provided by Yao (2013a). To estimate the unknown parameters in uncertain differential equation that fits the observed data as much as possible, several methods were proposed, for example, the method of moments (Yao and Liu 2020), least squares estimation (Sheng et al. 2019), generalized moment estimation (Liu 2020b), uncertain maximum likelihood (Liu and Liu 2020), and minimum cover estimation (Yang et al. 2020).

Recently, many scholars applied uncertain statistics to modelling COVID-19 pandemic. For instance, Liu (2020a) used uncertain regression analysis to forecast the cumulative numbers of COVID-19 infections in China, while Ye and Yang (2020) used uncertain time series. Following that, Chen et al. (2020) presented an uncertain SIR model, and Jia and Chen (2020) proposed an uncertain SEIAR model by employing high-dimensional uncertain differential equations.

However, there are still two challenges in this topic. The first one is how to estimate the zero-day of COVID-19 spread in China. This is the problem of initial value estimation for uncertain differential equations. The second one is how to estimate the parameters of uncertain differential equations based on observed data when the parameters are time-varying. This is the problem of time-varying parameter estimation.

The rest of this paper is organized as follows. Section 2 will define a concept of \(\alpha \)-region of solution for uncertain differential equations, and Sect. 3 will present a problem of initial value estimation for uncertain differential equations and propose an estimation method. The cumulative numbers of COVID-19 infections in China will be surveyed in Sect. 4, and a COVID-19 spread model based on uncertain differential equation will be derived in Sect. 5. In Sect. 6, the method of moments will be recast for estimating the time-varying parameters of the COVID-19 spread model. Section 7 will infer the zero-day of COVID-19 spread in China. Section 8 will show that stochastic COVID-19 spread model is not suitable. Finally, Sect. 9 will provide a brief conclusion.

2 The \(\alpha \)-region of solution

The \(\alpha \)-region of solution of an uncertain differential equation is defined as the set that the solutions may fall in.

Definition 1

Let \(\alpha \) be given with \(\alpha \ge 0.5\). Suppose \(X_t^\alpha \) and \(X_t^{1-\alpha }\) are the \(\alpha \)-path and \((1-\alpha )\)-path of an uncertain differential equation

$$\begin{aligned} \mathrm{d}X_t = f(t,X_t) \mathrm{d}t + g(t,X_t) \mathrm{d}C_t \end{aligned}$$
(1)

with initial value \(x_{t_0}\), respectively. Then the set

$$\begin{aligned} \begin{aligned} \displaystyle S^\alpha (t_0,x_{t_0}) = \{ (t,x) \in \mathfrak {R}^{2} | \ X_{t}^{1-\alpha } \le x \le X_{t}^{\alpha }, \ t \ge t_0\} \end{aligned} \end{aligned}$$
(2)

is said to be the \(\alpha \)-region of solution with respect to \(x_{t_0}\) for the uncertain differential equation (1).

Example 1

Let \(\alpha \) be given with \(\alpha \ge 0.5\). For the uncertain differential equation

$$\begin{aligned} \mathrm{d}X_t = a \mathrm{d}t + b \mathrm{d}C_t, \end{aligned}$$
(3)

with initial value \(x_0=0\), since its \(\alpha \)-path and \((1-\alpha )\)-path are

$$\begin{aligned} X_t^\alpha =at+|b| \varPhi ^{-1} (\alpha ) t \end{aligned}$$

and

$$\begin{aligned} X_t^{1-\alpha }=at+|b| \varPhi ^{-1} (1-\alpha ) t, \end{aligned}$$

respectively, the \(\alpha \)-region of solution with respect to \(x_0=0\) for the uncertain differential equation (3) is

$$\begin{aligned} \displaystyle S^\alpha (0,0) = \{ (t,x) \in \mathfrak {R}^2 | \ at+|b| \varPhi ^{-1} (1-\alpha ) t \le x \le at+|b| \varPhi ^{-1} (\alpha ) t, \ t \ge 0\} \end{aligned}$$

where

$$\begin{aligned} \varPhi ^{-1}(\alpha )=\frac{\sqrt{3}}{\pi } \ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

Example 2

Let \(\alpha \) be given with \(\alpha \ge 0.5\). For the uncertain differential equation

$$\begin{aligned} \mathrm{d}X_t = aX_t \mathrm{d}t + bX_t \mathrm{d}C_t \end{aligned}$$
(4)

with initial value \(x_0=1\), since its \(\alpha \)-path and \((1-\alpha )\)-path are

$$\begin{aligned} X_t^\alpha =\exp (at+|b| \varPhi ^{-1} (\alpha ) t) \end{aligned}$$

and

$$\begin{aligned} X_t^{1-\alpha }=\exp (at+|b| \varPhi ^{-1} (1-\alpha ) t), \end{aligned}$$

respectively, the \(\alpha \)-region of solution with respect to \(x_0=1\) for the uncertain differential equation (4) is

$$\begin{aligned}&\displaystyle S^\alpha (0,1) = \{ (t,x) \in \mathfrak {R}^2 | \ \exp (at+|b| \varPhi ^{-1} (1-\alpha ) t)\\&\le x \le \exp (at+|b| \varPhi ^{-1} (\alpha ) t), \ t \ge 0\} \end{aligned}$$

where

$$\begin{aligned} \varPhi ^{-1}(\alpha )=\frac{\sqrt{3}}{\pi } \ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

To obtain the \(\alpha \)-region of solution, the core problem is to compute the \(\alpha \)-path of uncertain differential equation. In order to do it, some numerical methods were designed, for example, Euler method (Yao and Chen 2013), Runge–Kutta method (Yang and Shen 2015) and Adams method (Yang and Ralescu 2015).

3 Initial value estimation

Assume an uncertain process follows an uncertain differential equation and some realizations of this process are observed. How to estimate the initial value of the process based on the uncertain differential equation and observed data is an interesting problem for practice.

Definition 2

Suppose an uncertain process \(X_t\) follows an uncertain differential equation

$$\begin{aligned} \mathrm{d}X_t = f(t,X_t) \mathrm{d}t + g(t,X_t) \mathrm{d}C_t, \end{aligned}$$
(5)

and \(x_{t_1},x_{t_2},\ldots ,x_{t_n}\) are the observed data of \(X_t\) at the times \(t_1,t_2,\ldots ,t_n\), respectively. For any given confidence level \(\alpha \ge 0.5\), the set

$$\begin{aligned} \begin{aligned} \displaystyle \ O^\alpha = \{ (t_0,x_{t_0})&| \ (t_i,x_{t_i})\in S^\alpha (t_0,x_{t_0}), \ i=1,2,\ldots ,n\} \end{aligned} \end{aligned}$$
(6)

is said to be the \(\alpha \)-region of initial value with respect to the observed data \(x_{t_1},x_{t_2},\ldots ,x_{t_n}\) for the uncertain differential equation (5), where \(S^\alpha (t_0,x_{t_0})\) is the \(\alpha \)-region of solution with respect to \(x_{t_0}\).

The following algorithm provides a way to judge whether \((t_0,x_{t_0}) \in O^\alpha \) or not.

Algorithm 1

Step 1: Compute \(\alpha \)-path \(X_t^\alpha \) and \((1-\alpha )\)-path \(X_t^{1-\alpha }\) of the uncertain differential equation (5) by Euler method, Runge-Kutta method or Adams method.

Step 2: Set \(i=0\).

Step 3: Set \(i \leftarrow i+1\).

Step 4: If \(X_{t_i}^{\alpha } < x_{t_i}\) or \(X_{t_i}^{1-\alpha } > x_{t_i}\), then output \((t_0,x_{t_0}) \notin O^{\alpha }\) and stop.

Step 5: If \(i < n\), then go to Step 3.

Step 6: Output \((t_0,x_{t_0}) \in O^{\alpha }\).

4 Cumulative numbers of COVID-19 infections in China

The cumulative numbers of COVID-19 infections in China excluding imported cases from January 20 to March 15, 2020 were reported by National Health Commission of China, and summarized by Liu (2020a) and Ye and Yang (2020). See Table 1.

Table 1 Cumulative numbers of COVID-19 infections in China excluding imported cases from January 20 to March 15, 2020

Let \(1,2,\ldots ,56\) represent the dates (t) from January 20 to March 15. For example, \(t=1\) and \(t=56\) represent January 20 and March 15, respectively. Also let \(x_1,x_2,\ldots ,x_{56}\) represent the cumulative numbers on dates \(1,2,\ldots ,56\), respectively. For example,

$$\begin{aligned} x_1=291, \quad x_{56}=80737. \end{aligned}$$

Based on the observed data of cumulative numbers of COVID-19 infections in China, Liu (2020a) obtained the fitted logistic growth model

$$\begin{aligned} x_t=\frac{80858}{1+22.741 \exp (-0.179t)} \end{aligned}$$
(7)

where \(x_t\) is the cumulative number of COVID-19 infections in China on date t.

5 COVID-19 spread model

Effective reproductive rate refers to as the rate of change of cumulative numbers per unit of time. Let \(R_t\) denote the effective reproductive rate and \(X_t\) denote the cumulative number of COVID-19 infections in China at time t. During a small time interval \([t,t+\varDelta t]\), we should have

$$\begin{aligned} R_t = \frac{X_{t+\varDelta t} - X_{t} }{X_{t} \varDelta t }. \end{aligned}$$
(8)

Now we assume

$$\begin{aligned} R_t = \mu _t + \sigma _t \cdot \text {``}\text{ Noise }\text {''} \end{aligned}$$
(9)

where \(\mu _t\), \(\sigma _t\) are real-valued functions with respect to time t, and “Noise” is a standard normal uncertain variable . Based on uncertainty theory, let us represent the “Noise” by

$$\begin{aligned} \frac{C_{t+\varDelta t} - C_t}{\varDelta t} \end{aligned}$$

where \(C_t\) is a Liu process (Liu 2009). Then we have

$$\begin{aligned} R_t = \mu _t + \sigma _t \frac{C_{t+\varDelta t} - C_t}{\varDelta t}. \end{aligned}$$
(10)

It follows from (8) and (10) that

$$\begin{aligned} X_{t+\varDelta t} - X_{t} =R_t X_{t} \varDelta t = \mu _t X_t \varDelta t + \sigma _t X_t (C_{t+\varDelta t} -C_{t}). \end{aligned}$$
(11)

Generally, during a time interval [0, t] with a partition \(0=t_0< t_1 < \) \(\cdots < t_n =t\), we have

$$\begin{aligned} \begin{aligned} X_t - X_0&= \sum \limits _{i=0}^{n-1} (X_{t_{i+1}} - X_{t_{i}}) \\&=\sum \limits _{i=0}^{n-1} \mu _{t_i}X_{t_{i}} (t_{i+1}-t_i) + \sum \limits _{i=0}^{n-1} \sigma _{t_i} X_{t_{i}}(C_{t_{i+1}}-C_{t_i}) \\&\rightarrow \int _0^t \mu _sX_s \mathrm{d}s + \int _0^t \sigma _s X_s\mathrm{d}C_s \end{aligned} \end{aligned}$$

as

$$\begin{aligned} \max \limits _{0\le i \le n-1} | t_{i+1}-t_i| \rightarrow 0. \end{aligned}$$

That is,

$$\begin{aligned} X_t - X_{0} = \int _0^t \mu _sX_s \mathrm{d}s + \int _0^t \sigma _s X_s\mathrm{d}C_s. \end{aligned}$$
(12)

Thus we obtain a COVID-19 spread model based on uncertain differential equation,

$$\begin{aligned} \mathrm{d}X_{t} = \mu _t X_{t} \mathrm{d}t + \sigma _t X_{t} \mathrm{d}C_{t} \end{aligned}$$
(13)

where \(X_t\) is the cumulative number of COVID-19 infections in China at time t, \(C_t\) is Liu process, and \(\mu _t\) and \(\sigma _t\) are unknown time-varying parameters at this moment.

6 Time-varying parameter estimation

The cumulative numbers of COVID-19 infections in China before \(t=25\) (February 13, 2020) are not real-time data due to the capacity limitation of nucleic acid testing. However, to estimate the time-varying parameters in the COVID-19 spread model, it is insufficient to only use the observed data of cumulative numbers after February 13, 2020,

$$\begin{aligned} x_{25},x_{26},\ldots ,x_{56}. \end{aligned}$$

Therefore, we have to add data from the date when the isolation policy of Chinese government became efficient, i.e., from January 30 to February 12, 2020. According to the fitted logistic growth model (7), we reassign

$$\begin{aligned} \begin{aligned} 19369, 22127, 25117, 28316, 31691, 35199, 38789, \\ 42405, 45990, 49487, 52847, 56029, 58998, 61733 \ \end{aligned} \end{aligned}$$

to \(x_{11}, x_{12}, \cdots , x_{24}\), respectively. By using the data,

$$\begin{aligned} x_{11},x_{12},\ldots ,x_{56}, \end{aligned}$$

we will estimate the time-varying parameters \(\mu _t\) and \(\sigma _t\) in the COVID-19 spread model (13). For this purpose, the method of moments (Yao and Liu 2020) will be recast as follows.

First, let us estimate \(\mu _{11}\) and \(\sigma _{11}\) on January 30, 2020 (\(t=11\)) by applying the 10 observed data \(x_{11},x_{12},\ldots ,x_{20}\). The COVID-19 spread model (13) has a difference form

$$\begin{aligned} X_{t_{i+1}} = X_{t_{i}} + {\check{\mu }}_{11} X_{t_i} (t_{i+1}-t_{i}) + {\check{\sigma }}_{11} X_{t_i} (C_{t_{i+1}}-C_{t_i}), \end{aligned}$$

i.e.,

$$\begin{aligned} \frac{X_{t_{i+1}} - X_{t_{i}} - {\check{\mu }}_{11} X_{t_i} (t_{i+1}-t_{i}) }{ {\check{\sigma }}_{11} X_{t_i} (t_{i+1}-t_i)} = \frac{C_{t_{i+1}}-C_{t_i}}{t_{i+1}-t_i} \end{aligned}$$

for \(i=11,12,\ldots ,19\). Since

$$\begin{aligned} \frac{C_{t_{i+1}}-C_{t_i}}{t_{i+1}-t_i} \end{aligned}$$

identically follow a standard normal uncertainty distribution , we get

for \(i=11,12,\ldots ,19\). Substitute \(X_{t_i}\) and \(X_{t_{i+1}}\) with the observed data \(x_{t_i}\) and \(x_{t_{i+1}}\) in the above equation, and write

$$\begin{aligned} h_{i} ({\check{\mu }}_{11},{\check{\sigma }}_{11}) = \frac{x_{t_{i+1}} - x_{t_{i}} - {\check{\mu }}_{11} x_{t_i} (t_{i+1}-t_{i}) }{ {\check{\sigma }}_{11} x_{t_i} (t_{i+1}-t_i)} \end{aligned}$$
(14)

for \(i=11,12,\ldots ,19\). It is clear that \(h_{i}({\check{\mu }}_{11},{\check{\sigma }}_{11}), \ i=11,12,\ldots ,19\) can be regarded as 9 samples of the standard normal uncertainty distribution . It is clear that the first two sample moments of the samples \(h_{i}({\check{\mu }}_{11},{\check{\sigma }}_{11}), \ i=11,12,\ldots ,19\) are

$$\begin{aligned} \frac{1}{9} \sum \limits _{i=11}^{19} h_{i} ({\check{\mu }}_{11},{\check{\sigma }}_{11}) \quad \text{ and }\quad \frac{1}{9} \sum \limits _{i=11}^{19} h_{i}^2 ({\check{\mu }}_{11},{\check{\sigma }}_{11}), \end{aligned}$$

and the first two population moments of the standard normal uncertainty distribution are 0 and 1. Since the number of unknown parameters is 2, the moment estimate is then obtained by equating the first two sample moments to the corresponding first two population moments. In other words, the estimate \(({\check{\mu }}_{11},{\check{\sigma }}_{11})\) should solve the system of equations,

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{1}{9} \sum \limits _{i=11}^{19} \frac{x_{t_{i+1}} - x_{t_{i}} - {\check{\mu }}_{11} x_{t_i} (t_{i+1}-t_{i}) }{ {\check{\sigma }}_{11} x_{t_i} (t_{i+1}-t_i)}=0 \\ [0.4cm] \displaystyle \frac{1}{9} \sum \limits _{i=11}^{19} \left( \frac{x_{t_{i+1}} - x_{t_{i}} - {\check{\mu }}_{11} x_{t_i} (t_{i+1}-t_{i}) }{ {\check{\sigma }}_{11} x_{t_i} (t_{i+1}-t_i)}\right) ^2=1 \end{array} \right. \end{aligned}$$
(15)

whose root is \(({\check{\mu }}_{11},{\check{\sigma }}_{11}) = (0.1101,0.0216)\).

Next, let us estimate \(\mu _{12}\) and \(\sigma _{12}\) on the date \(t=12\) by applying the 10 observed data \(x_{12},x_{13},\ldots ,x_{21}\). Since

for \(i=12,13,\ldots ,20\), by the method of moments, we have

$$\begin{aligned} \left\{ \begin{array}{ll} \displaystyle \frac{1}{9} \sum \limits _{i=12}^{20} \frac{x_{t_{i+1}} - x_{t_{i}} - {\check{\mu }}_{12} x_{t_i} (t_{i+1}-t_{i}) }{ {\check{\sigma }}_{12} x_{t_i} (t_{i+1}-t_i)}=0 \\ [0.4cm] \displaystyle \frac{1}{9} \sum \limits _{i=12}^{20} \left( \frac{x_{t_{i+1}} - x_{t_{i}} - {\check{\mu }}_{12} x_{t_i} (t_{i+1}-t_{i}) }{ {\check{\sigma }}_{12} x_{t_i} (t_{i+1}-t_i)}\right) ^2=1 \end{array} \right. \end{aligned}$$

whose root is \(({\check{\mu }}_{12},{\check{\sigma }}_{12}) = (0.1018,0.0219)\).

As an analogy, we can get the estimated values \(({\check{\mu }}_{13},{\check{\sigma }}_{13}), ({\check{\mu }}_{14},{\check{\sigma }}_{14}),\ldots ,({\check{\mu }}_{47}, \) \({\check{\sigma }}_{47})\) shown in Table 2.

Table 2 Estimated values for \(\mu _t\) and \(\sigma _t\)

Basic reproductive rate refers to as the effective reproductive rate when COVID-19 started spreading naturally in a completely susceptible population. Since it can be considered that COVID-19 naturally spread in China before January 30, 2020 \((t=11)\), we regard

$$\begin{aligned} R_{11} = {\check{\mu }}_{11} + {\check{\sigma }}_{11} {\dot{C}}_{11} = 0.1101 + 0.0216{\dot{C}}_{11} \end{aligned}$$

as the basic reproductive rate of COVID-19 spread in China. In order to fit \(\mu _t\) and \(\sigma _t\), we may employ logistic decay models,

$$\begin{aligned} \mu _t = \frac{0.1101}{1+\beta _1 \exp (\beta _2 t)}, \quad \sigma _t = \frac{0.0216}{1+\beta _3 \exp (\beta _4 t)} \end{aligned}$$
(16)

where \(\beta _{1},\beta _{2},\beta _{3}\) and \(\beta _{4}\) are unknown parameters. By applying the least square estimate and samples \(({\check{\mu }}_i,{\check{\sigma }}_i), \ i=11,12\cdots ,47\) in Table 2, we get the time-varying parameters,

$$\begin{aligned} \mu _t = \frac{0.1101}{1+0.0083 \exp (0.2567 t)}, \quad \sigma _t = \frac{0.0216}{1+0.0034 \exp (0.2312 t)}. \end{aligned}$$
(17)

It follows from (13) and (17) that the COVID-19 spread model based on uncertain differential equation is

$$\begin{aligned} \begin{aligned} \mathrm{d}X_{t}&= \frac{0.1101X_{t}\mathrm{d}t}{1+0.0083\exp (0.2567t)} +\frac{0.0216 X_{t} \mathrm{d}C_{t}}{1+0.0034\exp (0.2312t)} \end{aligned} \end{aligned}$$
(18)

where \(X_t\) is the cumulative number of COVID-19 infections in China at time t, and \(C_t\) is Liu process.

7 Zero-day of COVID-19 spread in China

Note that the cumulative number \(X_t\) of COVID-19 infections in China follows COVID-19 spread model (18), and \(x_{25},x_{26},\ldots ,x_{56}\) in Table 1 are observed data of \(X_t\) at the times \(25,26,\ldots ,56\), respectively. Taking \(\alpha =0.95\) and applying Algorithm 1, we obtain the 0.95-region of initial value, \(O^{\alpha }\), of the COVID-19 spread model that is shown by the shaded area in Fig. 1.

Fig. 1
figure 1

The 0.95-region of initial value of COVID-19 spread model (shaded area)

Zero-day of COVID-19 spread in China is the day when the earliest case (not earliest confirmed case) of COVID-19 happened in China. Suppose there was only one infectious case on zero-day, i.e., \(x_{t_0}=1\). Then the zero-day \((t_0)\) of COVID-19 spread in China is the slice of \(O^\alpha \) corresponding to \(x_{t_0}=1\), i.e., the interval

$$\begin{aligned} \{ t_0 \in \mathfrak {R}| \ (t_0,1) \in O^\alpha \} = -94 \pm 36. \end{aligned}$$
(19)

That means, the zero-day of COVID-19 spread in China is

$$\begin{aligned} \hbox {October 17, 2019 }\pm \hbox { 36 days.} \end{aligned}$$

It is concluded that, roughly speaking, COVID-19 started spreading in China from October 17, 2019.

8 Why is stochastic COVID-19 spread model not suitable?

If Liu process \(C_t\) in the COVID-19 spread model (18) is replaced with Wiener process \(W_t\), then we obtain a stochastic differential equation

$$\begin{aligned} \mathrm{d}X_{t} = \frac{0.1101X_{t}\mathrm{d}t}{1+0.0083\exp (0.2567t)} +\frac{0.0216 X_{t} \mathrm{d}W_{t}}{1+0.0034\exp (0.2312t)}. \end{aligned}$$
(20)

Suppose there was only one infectious case on October 17, 2019 (i.e., \(t_0=-94\) and \(x_{t_0}=1\)). Taking a date, e.g., \(t=30\) (February 18, 2020), we have

$$\begin{aligned} \Pr \{ X_{30+\varDelta t} < X_{30} \} \ge 46.22 \% \end{aligned}$$

when \(\varDelta t=10^{-6}\). That means, the cumulative number of COVID-19 infections in China decreases with a probability of \(46.22\%\). However, the cumulative number \(X_t\) is always increasing with respect to t. Hence stochastic COVID-19 spread model is not acceptable.

9 Conclusion

This paper presented a problem of initial value estimation for uncertain differential equations and proposed an estimation method. Furthermore, the method of moments was recast for estimating the time-varying parameters in uncertain differential equations. Using those techniques, a COVID-19 spread model based on uncertain differential equation was derived, and the zero-day of COVID-19 spread in China was inferred.