1 Introduction

Coronaviruses are one of the most significant threats to human society [1,2,3,4,5,6]. Limited to short outbreaks in the recent past [7,8,9], their pandemic-level potential was well known [10, 11], yet most countries proved unprepared to cope with the so-called coronavirus infectious disease of 2019 (COVID-19). Revealed in the Hubei province, China, the novel coronavirus has spread all over the world. China responded with massive containment measures starting at the end of January 2020, which limited further contamination on the mainland [8, 12]. In Europe, most individual states have responded with similar containment measures. However, there has been a lack of common European action. Strict or soft containment measures have been applied with different timeframes and specialized to individual health and socio-cultural systems, showing very different pandemic evolutions. At the time of writing, the main episode of COVID-19 is (in general) under control in China, South Korea, and continental Europe [11], despite the possibility of multiple waves. On the contrary, North America and South America are still in the middle of the pandemic, and a clear picture of the evolution of events is not possible yet.

The amount of data available allows various modelling techniques to be tested more robustly than in previous epidemics. However, no model (from the physics-based to the purely data-driven) has been or is able to predict the long-term evolution of the pandemic accurately. (Conversely, short-term predictions are possible with some degree of accuracy [13, 14].) There are several reasons behind this long-term unpredictability; an incomplete list includes the partial understanding of the phenomenon, the (many, or even infinite) missing variables, the high sensitivity of the model to parameters, the incomplete/inaccurate data acquisition scheme, and the lack of uniform measurement methods. However, a profound reason that makes any long-term prediction difficult is the presence of endogenous variables (a well-known problem in social sciences [15]). The endogenous variables may involve local policies, socio-cultural aspects, human behaviours, and communication strategies, and they are typically difficult to model and measure. Since an epidemic evolves as a result of the interplay between the “natural evolution” of the disease and society/human interventions, a robust and generalizable microscopic model with complete characterizations of endogenous variables is challenging to build. Given this, in this study, we use macroscopic phenomenology-based modelling to gain insight into the epidemic dynamics.Footnote 1 Therefore, here, the goal is not to provide long-term numerical predictions, although the proposed modelling technique can be used for extrapolation.

Among various modelling options, susceptible–infected–removed (SIR) types of compartmental models have gained wide popularity due to their simplicity and straightforwardness in interpreting the macroscopic phenomenology. A significant amount of SIR-type model based studies have already been carried out to investigate the transmission properties of the COVID-19, and an incomplete list includes [8, 12, 16,17,18,19,20]. The spectrum of complexity of these models is broad. They can range from a minimum number of compartments (which offers a better generalization) to a large number of compartments (which offers a better local description). They can be deterministic (i.e. counting the deterministic number of individuals for each compartment) or stochastic (i.e. defining a joint probability measure of the number of individuals for each compartment). They can have different data acquisition schemes (from a simple frequentist analysis of the single parameters to a complete Bayesian inversion scheme). Finally, they can simply macroscopically describe the pandemic evolution of a given location (top-down approach) or include a spatial topological description (including mobility) and/or a different degree of spreading among individuals by including adjacent matrices (bottom-up approach).

Given a dataset, these models can be calibrated and offer new insights into the evolution of the pandemic. For example, they can shed light on how the pandemic developed by measuring the reproductive ratio (constant or time varying) and finally estimating the effectiveness of containment measures. This metric also allows for a comparison between different regions but does not provide a quantitative measure of the impact of the spread. On the other hand, the evolution of the number of infected and deaths provides a means of direct impact; however, they lack objectivity as they are strongly influenced by the different populations of the regions, the measurement strategies, and the unreliability of the data. Furthermore, they are not global metrics as they do not provide an objective and robust way to unify them into a single (scalar) measure. Therefore, there is a research gap on how to provide a macroscopic model-metric pair to compare different regions’ performance and get new insights into various outbreaks.

This study aims to fill this gap by proposing a macroscopic stochastic model equipped with a global transmission metric based on entropy. In this context, the entropy evolution of the process is a metric that describes the degree of disorder (i.e. of impact) of an epidemic. This metric allows for an objective comparison between regions and provides a global measure of both the evolution and the impact of COVID-19 outbreaks. In particular, we propose a compartmental stochastic model that has the following characteristics. (i) Stochastic: the model describes a statistically averaged individual by a nonlinear Markov process with compartmental epidemic states. (ii) Time-dependent: the model parameters are decomposed onto generic basis functions (of time). (iii) Parsimonious: instead of conventional orthogonal basis functions (e.g. orthogonal polynomials, Fourier/wavelet series) the adaptive basis functions are adopted to achieve a representation with minimum number of basis functions. (iv) Bayesian: the time-dependent parameters are assumed to be random and are calibrated by full Bayesian inversion. Furthermore, we equip the model with a metric based on entropy which has the following characteristics. (i) Meaningful: the metric provides a physical and transparent measure of the COVID-19 impact in a given region; moreover, it is by definition the time integral of the entropy rate, which represents the temporal evolution of the epidemic. (ii) Global: the metric provides a global and average description of the pandemic event. (iii) Consistent: the metric is not influenced by the number of individuals and can be used objectively to compare different regions. (iv) Robust the metric is associated with an error that is a direct output of the Bayesian inversion scheme used to calibrate the stochastic model. Finally, to have a reliable description of the events, we provide robust strategies to fill in missing information and to correct the numerous inconsistencies on the current datasets.

The paper is organized as follows. First, we develop general concepts of the proposed epidemic model, including governing equation, time-dependent parameterization, and Bayesian model calibration (Sect. 2). Second, we introduce the entropy-based metric in Sect. 3. Third, we apply the proposed approach to formulate a SEIR compartmental model for modelling the temporal evolution of COVID-19 (Sect. 4). Next, we apply the proposed approach to real datasets to the following regions: Hubei (China), South Korea, Italy, Spain, Germany, and France (Sect. 5). Finally, we conclude the study by identifying the limitations, conclusions, and future research directions.

2 The stochastic epidemic model

In the literature, the term “stochastic compartmental model” can refer to different formulations (see e.g. [21] for a review) with distinct underlying assumptions on the source of uncertainty. For instance, the noise-driven stochastic model is formulated by: (i) introducing additive noise process into the deterministic compartmental model; (ii) translating the noise into diffusion of probability distribution; and (iii) obtaining an equation of probability distribution (e.g. Fokker–Plank equation). Clearly, in the noise-driven model the source of uncertainty is the additive noise. An alternative and more popular stochastic formulation is the event-driven model, which can be summarized as a direct stochastic simulation of the deterministic model. Specifically, in the event-driven model the deterministic rate matrix is used to define the transition probability of event \(X_m(t)\xrightarrow {t+\varDelta t}X_m(t)\pm \varDelta X\), where \(X_m(t)\) denotes the population in a compartment and \(\varDelta X\) the intra-state increment. With the transition probability, a direct stochastic simulation (via e.g. Gillespie’s Direct Method [22, 23]) would yield a random scenario of the epidemic. The proposed model can also be classified as an event-driven approach in the sense that the source of uncertainty is also the aleatory variability of transitions between epidemic states. However, instead of a stochastic simulation without a governing equation of probability distribution, the proposed model strictly follows an equation of probability distribution which describes a nonlinear Markov process. Consequently, the proposed model possesses clear physical interpretations within the mathematical framework of nonlinear Markov process theory.

Compartmental models with time-varying parameters have been widely studied in the literature [24,25,26,27,28]. A fundamental question to be addressed in time-dependent models is the trade-off between over-fitting and under-fitting, or equivalently, model bias versus model variance. In the extreme scenario, a pointwise kernel-based parameterization may lead to an almost exact calibration on epidemic observations, yet the explanatory/extrapolation capability would be minimized, and the model variance would be maximized. In an over-parameterized model, the non-local trend/structure, which is crucial to characterize/understand the epidemic dynamics, can hardly be identified. In this study, we attempt to discover non-local structures from the epidemic dataset using a parsimonious formulation with adaptive basis functions. Moreover, since the model calibration is formulated in a Bayesian framework, likelihood-based model selection, e.g. using the Bayesian information criterion (BIC) [29], can be conveniently applied to specify the number of adaptive basis functions.

2.1 The original deterministic model

Consider a generic compartmental epidemic model with a fixedFootnote 2 total population N and a classification of the population into M compartments \(\varvec{X}=[X_1,\ldots ,X_M]^\top \). The compartmental epidemic model describes the temporal evolution of the state vector \(\varvec{X}\), where every component of \(\varvec{X}\) is by definition nonnegative and \(\varvec{X}\) is subjected to the conservation law \(\Vert \varvec{X}\Vert _1=N\).

For an infinitesimal incremental \(\varDelta t\), we study the following master equation of state vector.

$$\begin{aligned} \varvec{X}(t+\varDelta t)=(\varvec{I}+\varvec{H}(\varvec{X}(t),t)\varDelta t)\varvec{X}(t),\ \end{aligned}$$
(1)

where \(\varvec{I}\) is the identity matrix and \(\varvec{H}(\varvec{X}(t),t)\) is a problem-specific rate matrix (infinitesimal propagator). Equation (1) is equipped with the assumption that the evolution of \(\varvec{X}(t)\) is smooth,Footnote 3 i.e. without jumps. Setting \(\varDelta t\rightarrow 0\), Eq. (1) leads to

$$\begin{aligned} \frac{\mathrm{d}\varvec{X}(t)}{\mathrm{d}t}=\varvec{H}(\varvec{X}(t),t)\varvec{X}(t).\ \end{aligned}$$
(2)

Similar to mechanics, the rate matrix \(\varvec{H}(\varvec{X}(t),t)\) governs the dynamics of \(\varvec{X}(t)\). To preserve the conservation law \(\Vert \varvec{X}(t)\Vert _1=N\), we must have \(\varvec{H}(\varvec{X}(t),t)^\top \varvec{1}=\varvec{0}\), where \(\varvec{1}\) is a vector of ones and \(\varvec{0}\) the null vector.

Particularly, if \(\varvec{X}(t)\) eventually attains a stationary state \(\varvec{X}^*\) defined as

$$\begin{aligned} \varvec{X}^*:=\lim _{t\rightarrow +\infty }\varvec{X}(t),\ \end{aligned}$$
(3)

and define \(\varvec{H}^*\) as

$$\begin{aligned} \varvec{H}^*:=\lim _{t\rightarrow +\infty }\varvec{H}(\varvec{X}(t),t).\ \end{aligned}$$
(4)

We obtain the stationarity condition

$$\begin{aligned} \varvec{H}^*\varvec{X}^*=\varvec{0},\ \end{aligned}$$
(5)

where \(\varvec{0}\) is a column vector of zeros.

2.2 Probabilistic reformulation

Given a deterministic \(\varvec{H}(\varvec{X}(t),t)\), Eq. (2) describes a deterministic trajectory of \(\varvec{X}(t)\). Since variabilities inevitably exist in the specification of \(\varvec{H}(\varvec{X}(t),t)\) or/and the initial condition, the solution \(\varvec{X}(t)\) becomes a multivariate stochastic process. However, the aforementioned “randomization” is regarded as epistemic with respect to the model Eq. (2).Footnote 4 This section focuses on a more fundamental (aleatory) probabilistic reformulation of Eq. (2).

Adopting a frequentist point of view on probability, consider a normalization of \(\varvec{X}(t)\) by

$$\begin{aligned} \varvec{P}(t):=\lim _{N\rightarrow +\infty }\frac{\varvec{X}(t,N)}{\Vert \varvec{X}(t,N)\Vert _1}=\lim _{N\rightarrow +\infty }\frac{\varvec{X}(t,N)}{N},\ \end{aligned}$$
(6)

where X(tN) is used to highlight that the compartmental population depends on N, and \(\varvec{P}(t)\) can be interpreted as the marginal probability distribution of a discrete-state continuous-time stochastic process. Observe that despite \(\varvec{P}(t)\) being equivalent to proportions in a deterministic model, the probabilistic individualistic interpretation leads to a fully stochastic dynamic interpretation of the problem. The underlying state associated with \(\varvec{P}(t)\) is an epidemic state of a statistically averaged individual. Analogous to Eqs. (1) and (2), we obtain

$$\begin{aligned} \varvec{P}(t+\varDelta t)=(\varvec{I}+\varvec{Q}(\varvec{P}(t),t)\varDelta t)\varvec{P}(t),\ \end{aligned}$$
(7)

and

$$\begin{aligned} \frac{\mathrm{d}\varvec{P}(t)}{\mathrm{d}t}=\varvec{Q}(\varvec{P}(t),t)\varvec{P}(t),\ \end{aligned}$$
(8)

where \(\varvec{Q}(\varvec{P}(t),t)\) is a rate matrix analogous to \(\varvec{H}(\varvec{X}(t),t)\) in the deterministic model. Since \(\varvec{Q}(\varvec{P}(t),t)\) explicitly depends on \(\varvec{P}(t)\), Eq. (8) describes a nonlinear Markov process [30]. The conservation of probability is guaranteed by \(\varvec{Q}(t)^\top \varvec{1}=\varvec{0}\).

Equation (7) provides a straightforward strategy for sampling random realizations of the process. In particular, for a fixed initial condition \(\varvec{P}(t_0)\), the solution \(\varvec{P}(t)\) is deterministic, and \(\varvec{Q}(\varvec{P}(t),t)\) can be regarded as \(\varvec{Q}(t)\) with \(\varvec{P}(t)\) being a time-dependent parameter of \(\varvec{Q}(t)\). The resulting tangent non-homogeneous Markov process has the following transient stochastic matrix

$$\begin{aligned} \varvec{S}(t,t+\varDelta t):=\varvec{I}+\varvec{Q}(t)\varDelta t.\ \end{aligned}$$
(9)

In line with the macroscopic description [Eq. (6)], the initial condition \(\varvec{P}(t_0)\) and the rate matrix \(\varvec{Q}(\varvec{P}(t),t)\) are by definition exactly the same for all N individuals. This assumption corresponds ad verbum to fix a constant average number of contacts and other interaction parameters between persons per unit time. Given this, there is an implicit assumption of statistical independence among the N individuals. In an adiabatic system, this is equivalent to letting N particles following N-independent Brownian motions. Therefore, this macro-description is emerging from the micro-behaviour of individuals interacting according N-independent Brownian motions, and the virus is spreading according to a simple diffusive process.Footnote 5 Consequently, a macro-random scenario of an epidemic can be obtained via simulating N-independent and identically distributed processes from Eq. (8).

In contrast to this macro-description, one could adopt a topological structure of the interactions between different individuals. This is generally done by including an adjacency operator which accounts for the different structure of the interactions among individuals (e.g. including mobility information, or considering the presence of superspreaders). As a consequence, each individual (or group of individuals) has a different average number of contacts and different interaction parameters. This leads to a heterogeneous compartmental model, which is inevitable dependent on a specific geographical area or social system. In this study, we focus on the general transmission trend of large regions, so that the trends can be more easily extrapolated and interpreted. Therefore, the simple macro-description is adopted. It is a specific choice which leads to a novel entropy-based measure to macroscopically compare the epidemic scenarios in different regions.

2.3 Time-dependent parameter model

We assume the “correct” model of \(\varvec{Q}(\varvec{P}(t),t)\) cannot be discovered, and \(\varvec{Q}(\varvec{P}(t),t)\) is replaced by a parametric model with a set of parameters \({\varvec{\alpha }}(t)\), i.e.

$$\begin{aligned} \varvec{Q}(\varvec{P}(t),t)\approx {\varvec{Q}}(\varvec{P}(t),t;{\varvec{\alpha }}(t)).\ \end{aligned}$$
(10)

Let \(\alpha (t)\) represent an arbitrary component of \({\varvec{\alpha }}(t)\). A generic approach to parameterizing \(\alpha (t)\) is to consider an expansion of the following form

$$\begin{aligned} \alpha (t)=\sum _{i=0}^{I}w_i\psi _i(t),\ \end{aligned}$$
(11)

where \(w_i\) are coordinates of basis functions \(\psi _i(t)\). A popular choice for the basis function is the orthogonal polynomials, e.g. Legendre/Hermite/Laguerre/Chebyshev polynomials. An issue with orthogonal polynomial basis is that it may require high-order terms to represent a complex function, and consequently, this leads to over-fitting and implausible extrapolations. A powerful alternative is to use adaptive basis functions with the form

$$\begin{aligned} \alpha (t)=\sum _{i=0}^{I}w_{i}\psi _i(t,\varvec{w}'_{i}),\ \end{aligned}$$
(12)

where \(\varvec{w}'_i\) are parameters of the adaptive basis. The benefit of using Eq. (12) instead of Eq. (11) is that a parsimonious representation can be formulated, at the cost of introducing additional parameters in bases. An attractive choice for the adaptive basis \(\psi _i(t,\varvec{w}_i)\) is the sigmoid function, i.e.

$$\begin{aligned} \psi _i(t,\varvec{w}'_i)=\frac{1}{1+\exp (w'_{i1}-w'_{i2}t)}.\ \end{aligned}$$
(13)

The theoretical justification of using Eqs. (13) in (12) is the universal approximation theorem [31], and the resulting parametric function is in fact a feed-forward neural network with a single hidden layer.

In addition, the initial condition of Eq. (8) is unknown, and we parameterize \(\varvec{P}(t_0)\) by \({\varvec{\beta }}=[\beta _1,\ldots ,\beta _{M-1}]\) (recall that M is the number of compartments). A natural parameterization of \(\varvec{P}(t_0)\) is

$$\begin{aligned} \varvec{P}(t_0)=\left[ \beta _1,\ldots ,\beta _{M-1},1-\sum _{m=1}^{M-1}\beta _m\right] ^\top ,\ \end{aligned}$$
(14)

where \(\beta _m\) are nonnegative and subjected to the linear constraint \(\sum _{m=1}^{M-1}\beta _m\in [0,1]\). Note that \({\varvec{\beta }}\) is time independent in the sense that the starting time point can be fixed. Therefore, the full parameter set of the epidemic model is written as \({\varvec{\theta }}:=\left\{ \varvec{w}, \varvec{w}', {\varvec{\beta }}\right\} \).

2.4 Model calibration

The goal of model calibration is to find the optimal \({\varvec{\theta }}\) using real observation. We let \(\varvec{\mathcal {D}}\) denote the dataset of observations collected for an epidemic up to some reference time point. The dataset \(\varvec{\mathcal {D}}\) is composed by discrete measures on the number of persons in each observable compartment (e.g. infected, recovered, and dead), and \(\varvec{\mathcal {D}}\) is a matrix of dimension \(M_o\times T\), where \(M_o\) denotes the number of observable compartments and T denotes the number of observed unit time (e.g. days).

The likelihood function \(\mathcal {L}(\varvec{\mathcal {D}}|{\varvec{\theta }})\) measures the probability of observing \(\varvec{\mathcal {D}}\) given the model specified by \({\varvec{\theta }}\). Using Bayes rule on \({\varvec{\theta }}\), we have

$$\begin{aligned} \pi ({\varvec{\theta }}|\varvec{{\mathcal {D}}})\propto \mathcal {L}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\pi ({\varvec{\theta }}), \end{aligned}$$
(15)

where \(\pi ({\varvec{\theta }}|\varvec{{\mathcal {D}}})\) is the posterior distribution of \({\varvec{\theta }}\) conditional on the observed dataset \(\varvec{{\mathcal {D}}}\) and \(\pi ({\varvec{\theta }})\) is the prior distribution of \({\varvec{\theta }}\). The major challenge of using Eq. (15) in practice is to sample from the posterior, and typically, this can be handled by advanced Markov chain Monte Carlo methods.

The likelihood \(\mathcal {L}(\varvec{\mathcal {D}}|{\varvec{\theta }})\) may depend on both the observation error and the inherent variability of the epidemic model. Even by setting the observation error to zero, for any specified \({\varvec{\theta }}\) the prediction from the model is still random. If the accumulated numbers are of interest, e.g. the total number of recovered, for a large population size the variability in the prediction is expected to be small. Specifically, in a multinomial model the marginal coefficient of variation is proportional to \(1/\sqrt{NP_m(t)}\). However, at the same time, the model prediction can be extremely sensitive to \({\varvec{\theta }}\), and an almost negligible perturbation due to the (albeit small) randomness of \({\varvec{\theta }}\) may lead to noticeably different predictions. Therefore, the Bayesian analysis is meaningful with or without the observation error.

To formulate the likelihood function, we first denote an individual (directed) random walk among various states as a Boolean operator \(\varvec{Y}^{(n)}=[\varvec{y}^{(n)}_1,\ldots ,\varvec{y}^{(n)}_j,\ldots ,\varvec{y}^{(n)}_T]\), where \(n\in [1,\ldots ,N]\) and \(j\in [ 1,\ldots ,T]\) such that \(t_{j+1}-t_j = 1\) [unit time].Footnote 6 The vectors \(\varvec{y}^{(n)}_j\) (of dimension \(M_o\times 1\)) represent the state of the n person at time \(t_j\). Therefore, the components are all zero with exception of the current state, which takes the value of one. The joint probability density function of \(\varvec{Y}^{(n)}\), denoted by \(f(\varvec{Y}^{(n)}|{\varvec{\theta }})\), is readily available from the governing equation Eq. (8). Next, we note that the observation \(\varvec{\mathcal {D}}\) represents a collective scenario of the N-independent (under the assumptions of the macro-model) Markov processes \(\varvec{y}^{(n)}_j\).

Therefore, by brute force, the (observation-error-free) likelihood function has the following form

$$\begin{aligned} \mathcal {L}(\varvec{\mathcal {D}}|{\varvec{\theta }})=\sum _{y^{(n)}_{m,t}}\left[ \mathbb {1}\left( \sum _{n}\varvec{Y}^{(n)} = \varvec{\mathcal {D}}\right) \prod _{n} f\left( \varvec{Y}^{(n)}|\varvec{\theta }\right) \right] , \end{aligned}$$
(16)

where \(\mathbb {1}(\cdot )\) is an indicator function and \(\sum _{y^{(n)}_{m,t}}\) is a \(M_o^{T\times N}\)-fold summation. This brute force summation contains impossible paths that, however, are naturally excluded by the indicator function. Observe that this likelihood is fundamentally different from the deterministic compartmental models based on proportions rather than individual probabilities. Moreover, it is also different from the classical binomial (and related) likelihood approaches (used in direct Gillespie’s methods). Equation (16) is clearly computationally intractable.

To formulate a computationally tractable likelihood function, we use the Markovian property and rewrite \(\mathcal {L}(\varvec{\mathcal {D}}|{\varvec{\theta }})\) as

$$\begin{aligned} \mathcal {L}(\varvec{\mathcal {D}}|{\varvec{\theta }})=\mathcal {L}(\varvec{\mathcal {D}}(t_0)|{\varvec{\theta }})\prod _{j=1}^{T}\mathcal {L}(\varvec{\mathcal {D}}(t_j)|\varvec{\mathcal {D}}(t_{j-1});{\varvec{\theta }}), \nonumber \\ \end{aligned}$$
(17)

where \(\varvec{\mathcal {D}}(t_j)\) denotes the observation at time point \(t_j\). The first term \(\mathcal {L}(\varvec{\mathcal {D}}(t_0)|{\varvec{\theta }})\) can be easily computed from a multinomial distribution with probability vector \(\varvec{P}(t_0)\). The specific expression of \(\mathcal {L}(\varvec{\mathcal {D}}(t_j)|\varvec{\mathcal {D}}(t_{j-1});{\varvec{\theta }})\) varies with the adopted epidemic model, yet it is typically in the multinomial form. All the ingredients to compute \(\mathcal {L}(\varvec{\mathcal {D}}(t_j)|\varvec{\mathcal {D}}(t_{j-1});{\varvec{\theta }})\) are included in the marginal distribution \(\varvec{P}(t)\), and the stochastic matrix \(\varvec{S}(t_j,t_{j+1}|{\varvec{\theta }})\) is expressed as

$$\begin{aligned} \varvec{S}(t_j,t_{j+1}|{\varvec{\theta }})=\exp \left( \int _{t_j}^{t_{j+1}}\varvec{Q}(\tau |{\varvec{\theta }})\,\mathrm{d}\tau \right) .\ \end{aligned}$$
(18)

Observe that due to the discretization of a continuous-time Markov process into a discrete-time Markov process, the matrix \(\varvec{S}(t_j,t_{j+1}|{\varvec{\theta }})\) is “less sparse” than \(\varvec{Q}(t|{\varvec{\theta }})\). For example, in a finite time interval, the impossible event \(2\rightarrow 4\) in matrix \(\varvec{Q}\) may have a finite probability of occurring in matrix \(\varvec{S}\) (through e.g. \(2\rightarrow 3\rightarrow 4\)). In fact, Eq. (18) can be interpreted as the result of applying Eq. (9) infinite times within the integration interval.

For a simple illustration of concept in constructing the likelihood function, we consider a two-state system where state 1 can either move to state 2 or stay still, while state 2 can only stay still. We assume \(\varvec{\mathcal {D}}(t_{j-1})\) records [100, 50] in occupations of states 1 and 2 (for a total of 150 Markov chains), and \(\varvec{\mathcal {D}}(t_{j})\) records [90, 60]. Given the aforementioned transition structure, we know 10 out of 100 chains at \(t_{j}\) moves from state 1 to state 2. Therefore, the likelihood \(\mathcal {L}(\varvec{\mathcal {D}}(t_j)|\varvec{\mathcal {D}}(t_{j-1});{\varvec{\theta }})\) is simply the binomial \({{100}\atopwithdelims (){10}}P_{1\rightarrow 2}^{10}P_{1\rightarrow 1}^{90}\), where the transition probability \(P_{i\rightarrow j}\) can be directly read from Eq. (18). One may not be able to observe the populations in all compartments; in this case, the total probability theorem can be used to integrate the unobservable states out (see Sect. 4 for an example).

2.5 Addressing data unreliability

The likelihood function introduced above only considers the inherent stochastic variability of the model. In reality, on top of the inherent stochastic variability, the underlying errors/uncertainties of a reported dataset involve multiple alternative sources. A rigorous way to treat such unreliability of reported data is to introduce a distribution assumption on the error \({\varvec{\epsilon }}\), and the likelihood function can be written as

$$\begin{aligned} \mathcal {L}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})=\int _{{\varvec{\epsilon }}\in \varOmega _{{\varvec{\epsilon }}}}\mathcal {L}(\varvec{{\mathcal {D}}}|{\varvec{\theta }},{\varvec{\epsilon }})\pi ({\varvec{\epsilon }})\,\mathrm{d}{\varvec{\epsilon }}, \end{aligned}$$
(19)

where \(\mathcal {L}(\varvec{{\mathcal {D}}}|{\varvec{\theta }},{\varvec{\epsilon }})\) is the likelihood with a specified error, \(\pi ({\varvec{\epsilon }})\) is the probability distribution of the error, and \(\varOmega _{{\varvec{\epsilon }}}\) represents the feasible domain of the error. Note that in general \({\varvec{\epsilon }}\) represents a set of discretized stochastic processes. Apart from the technical challenge of integrating the high-dimensional Eq. (19), the major challenge of incorporating the error is the specification of \(\pi ({\varvec{\epsilon }})\). Clearly, an assumption on \(\pi ({\varvec{\epsilon }})\) would reshape the likelihood function towards the shape of \(\pi ({\varvec{\epsilon }})\), and an inappropriate assumption would generate artificial and even misleading transmission properties. Therefore, we adopt an indirect path to incorporate the unreliability of reported data. Specifically, we apply a kernel function \(\kappa (\cdot )\) to the original error-free likelihood function, i.e.

$$\begin{aligned} \hat{\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})=\kappa \left( {\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\right) . \end{aligned}$$
(20)

The kernel function is selected to “flatten” the likelihood function so that the unreliability in the reported data can be, to some extent, captured. In this study, we consider an exponential kernel, and Eq. (20) is rewritten as

$$\begin{aligned} \hat{\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})=\exp \frac{\log {\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})}{n_{\epsilon }}, \end{aligned}$$
(21)

where \(n_{\epsilon }>1\) is a scaling factor. Clearly, if \(n_{\epsilon }=1\), \(\hat{\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\) is identical to the original likelihood \({\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\), and if \(n_{\epsilon }\rightarrow \infty \), \(\hat{\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\) approaches uniform.

Instead of a specific error distribution, in practice it is more likely to have a crude idea on the possible magnitude of the errors in the reported dataset. For a further simplification, we focus on the errors in the infected cases, since the causal structure of infected and recovered/dead would let the errors in infected eventually flow into recovered/dead. Therefore, the question left is to relate “the magnitude of errors in infected cases” to the \(n_{\epsilon }\) in Eq. (21). It turns out, as a consequence of a sequence of qualitative reasoning, a reasonable choice of \(n_{\epsilon }\) is to let

$$\begin{aligned} n_{\epsilon }\propto \frac{\varDelta _\epsilon ^2}{\varDelta _\mathrm{infected}}, \end{aligned}$$
(22)

where \(\varDelta _\mathrm{infected}\) represents the maximum increment of infected and \(\varDelta _\epsilon \) represents the possible error in the maximum increment of infected. Note that Eq. (22) is proposed as a crude guidance for setting the magnitude of \(n_{\epsilon }\). The reasoning of Eq. (22) is described as follows. (i) In Eq. (21), if \({\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\) is Gaussian, the effect of applying \(1/n_{\epsilon }\) is to introduce a scaling factor of \(n_{\epsilon }\) to the covariance of \({\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\). (ii) The likelihood \({\mathcal {L}}(\varvec{{\mathcal {D}}}|{\varvec{\theta }})\) is a product of multinomial kernels (see Eq. (17)), which can be approximated by Gaussian with the maximum variance (of the infected compartment) in the size of \(\varDelta _\mathrm{infected}\). (iii) Equation (22) is obtained as one assumes the scaled variance (scaled by factor \(n_{\epsilon }\)) has a similar magnitude as \(\varDelta _{\epsilon }^2\).Footnote 7 For example, if one has a crude idea that the error of infected can be 30% of the reported infected, using Eq. (22) one could set \(n_{\epsilon }\propto 0.09\varDelta _\mathrm{infected}\).

3 Entropy as a global transmission metric

In the literature, only a few studies have investigated the application of entropy in epidemics modelling [32,33,34]. Moreover, in the previous works the motivation and formulation of entropy, as well as the adopted epidemics modelling framework, are entirely different from the current study. The key feature of the proposed stochastic model is that entropy-based transmission measures can be naturally developed. Specifically, for a discretized time grid \(\lbrace t_j,j=1,\ldots ,T\rbrace \) and stochastic matrix \(\varvec{S}(t_j,t_{j+1})\) (Eq. (18)), we consider the Shannon entropy rate expressed as

$$\begin{aligned}&\mathcal {H}(t_j|t_{j-1})\nonumber \\&\quad =-\sum _{m=1}^M\sum _{n=1}^MP_n(t_{j-1})S_{m,n}(t_{j-1},t_{j})\log (S_{m,n}(t_{j-1},t_{j})).\ \end{aligned}$$
(23)

For \(j=0\), \(\mathcal {H}(t_0|t_{-1})\equiv \mathcal {H}(t_0)=-\sum _{m=1}^{M}P_m(t_0)\log P_m(t_0)\). Recall that the marginal distribution \(\varvec{P}(t)\) and the stochastic matrix \(\varvec{S}(t_j,t_{j+1})\) vary with the initial condition \(\varvec{P}(t_0)\). Therefore, Eq. (23) and \(\mathcal {H}(t_0)\) should be averaged over the posterior distribution of the initial condition (obtained from the Bayesian analysis). In evaluation of Eq. (23), the convention \(0\log 0\equiv 0\) is adopted.

In a homogeneous Markov process, the entropy rate is constant, and one has the important theoretical result \(\lim _{T\rightarrow \infty }\frac{1}{T}\mathcal {H}(t_0,t_1,\ldots ,t_T)=\mathcal {H}(t_1|t_0)\). In the proposed epidemic model, the Markov process is nonlinear and non-homogeneous. Therefore, the evolution of the entropy rate \(\mathcal {H}(t_j|t_{j-1})\) within a specified duration should be considered, and they characterize the evolution of the degree of disorder.

Using the Markovian property of the epidemic model in conjunction with the additive property of entropy,Footnote 8 the entropy \(\mathcal {H}(t_0,t_1,\ldots ,t_T)\) has the concise form

$$\begin{aligned} \mathcal {H}(t_0,t_1,\ldots ,t_T)=\sum _{j=0}^T\mathcal {H}(t_j|t_{j-1}).\ \end{aligned}$$
(24)

The entropy \(\mathcal {H}(t_0,t_1,\ldots ,t_T)\) is a scalar, and it provides a global measure on the total degree of disorder for an epidemic scenario. An important feature (shared by the reproductive ratio) of the entropy rate and the total entropy is that they are quantitatively comparable across different regions. This is because the entropy-based measures are associated with the statistically averaged individual, which is similar to measuring the mean-field approximation of the complex epidemic dynamics system.

Fig. 1
figure 1

Diagram of the modified SEIR model. The transition from susceptible to exposed involves the term \(\alpha _1(t)P_2(t)\), indicating the exposed is contagious

Qualitative speaking, a large \(\mathcal {H}(t_0,t_1,\ldots ,t_T)\) may be contributed by: (i) a large pulse-like \(\mathcal {H}(t_j|t_{j-1})\), i.e. the entropy rate reaches high values but stays (in high values) for a short period; (ii) a moderate flat \(\mathcal {H}(t_j|t_{j-1})\), i.e. the entropy rate evolves with moderate values for a long period. In an epidemic scenario, a large pulse-like evolution of the entropy rate implies that the virus reaches a significant proportion of population but damped out (through the accumulation of recovered/dead) fast, and a flat evolution implies that the epidemic spreads in a moderate severe state for a long time. To quantitatively analyse whether the entropy rate evolution is pulse-like or flat, we introduce a concentration measure to \(\mathcal {H}(t_j|t_{j-1})\). Specifically, we again adopt the concept of Shannon entropy such that the concentration measure of \(\mathcal {H}(t_j|t_{j-1})\) is defined as the inverse of the Shannon entropy of the normalized \(\mathcal {H}(t_j|t_{j-1})\), i.e.

$$\begin{aligned} \mathcal {C}(\mathcal {H})= & {} \frac{1}{\mathcal {H}(\bar{\mathcal {H}}(t_{j}|t_{j-1}))}\nonumber \\= & {} -\frac{\mathcal {H}(t_0,t_1,\ldots ,t_T)}{\sum _{j=0}^T\mathcal {H}(t_{j}|t_{j-1})(\log {\mathcal {H}(t_{j}|t_{j-1})}-\log {\mathcal {H}(t_0,t_1,\ldots ,t_T))}},\ \end{aligned}$$
(25)

where \(\bar{\mathcal {H}}(t_{j}|t_{j-1}))=\mathcal {H}(t_{j}|t_{j-1}))/\mathcal {H}(t_0,t_1,\ldots ,t_T)\) is the normalized \(\mathcal {H}(t_j|t_{j-1})\). Note that the total entropy \(\mathcal {H}(t_0,t_1,\ldots ,t_T)\) appears in Eq. (25) as the normalizing constant of \(\mathcal {H}(t_j|t_{j-1})\) (when \(\mathcal {H}(t_j|t_{j-1})\) is normalized into a probability mass function). Also note that the Shannon entropy instead of the variance-based measures is adopted since a large variance does not necessarily reflect a large dispersion (e.g. a mixture model with highly concentrated component densities could produce a large variance).

In this paper, we propose the entropy rate \(\mathcal {H}(t_j|t_{j-1})\), the entropy \(\mathcal {H}(t_0,t_1,\ldots ,t_T)\), and the concentration factor \(\mathcal {C}(\mathcal {H})\) as complements to the conventional reproductive ratio. Appendix A illustrates various attractive features of the entropy-based measures. In practice, instead of computing entropy-based measures for the original distribution vector \(\varvec{P}\) and the stochastic matrix \(\varvec{S}\), one may need to reshape \(\varvec{P}\) and \(\varvec{S}\) to obtain measures of different scales. For example, a typical epidemic model may involve the states of recovered and dead. Naturally, one would prefer the scenario of “a large recovery probability and a small death probability” over “a large death probability and a small recovery probability.” However, Eq. (23) or Eq. (24) does not differentiate between recovery and death, and the aforementioned two scenarios can have exactly the same entropy (rate). The conventional reproductive ratio measure has the same issue. To let the entropy-based measures incorporate the concept of “high recovery probability is preferable over high death probability”, one could reshape the distribution vector \(\varvec{P}\) and the stochastic matrix \(\varvec{S}\) by merging the recovery state with the infected state. Consequently, the entropy (rate) of the reshaped system would diminish the contribution from the recovered state and highlight the contribution from the dead state (see Appendix A for an example).

4 Application to COVID-19

In the light of the general framework introduces in Sects. 2 and 3, this section introduces a simple modification of the SEIR model with the exposed being also contagious.

4.1 Modified SEIR

The modified SEIR has a five-dimensional probability state vector \(\varvec{P}(t)\) described as follows:

  • \(P_1(t)\): the (instantaneous probability of being) susceptible.

  • \(P_2(t)\): the exposed.

  • \(P_3(t)\): the infected.

  • \(P_4(t)\): the recovered.

  • \(P_5(t)\): the dead.

The rate matrix \(\varvec{Q}(\varvec{P}(t),t)\) is written as

$$\begin{aligned}&\varvec{Q}(\varvec{P}(t),t) \nonumber \\&=\begin{bmatrix} -(\alpha _1(t)P_2(t)+\alpha _2(t)P_3(t)) &{} 0 &{} 0 &{} 0 &{} 0 \\ \alpha _1(t)P_2(t)+\alpha _2(t)P_3(t) &{} -\alpha _3(t) &{} 0 &{} 0 &{} 0 \\ 0 &{} \alpha _3(t) &{} -(\alpha _4(t)+\alpha _5(t)) &{} 0 &{} 0 \\ 0 &{} 0 &{} \alpha _4(t) &{} 0 &{} 0 \\ 0 &{} 0 &{} \alpha _5(t) &{} 0 &{} 0 \\ \end{bmatrix},\nonumber \\ \end{aligned}$$
(26)

where \({\varvec{\alpha }}(t)=[\alpha _1(t),\ldots ,\alpha _5(t)]\) are non-negative parameters to be calibrated. Note that here for generality we write every parameter as time dependent; however, in practice it is typically sufficient to set only a few of them as time dependent. Figure 1 illustrates the flow between compartments of the modified SEIR model.

4.2 Likelihood function

The likelihood function can be derived as a simple application of the concepts introduced in Sect. 2.4. First, we introduce a compound state, denoted by \(1\vee 2\), to represent the state of being in either susceptible or exposed. The most important property of the compound state \(1\vee 2\) is that it is an observable, i.e. if an individual is not at the state of infected nor at the state of recovered/dead, it is in the compound state. We let \(P_{1\vee 2\rightarrow m}(t_j,t_{j+1})\) represent the transition probability from the compound state at \(t_j\) to other states m, \(m=3\) (infected), 4 (recovered), 5 (dead) at \(t_{j+1}\). Using the total probability theorem, \(p_{1\vee 2\rightarrow m}(t_j,t_{j+1})\) can be expressed as

$$\begin{aligned}&P_{1\vee 2\rightarrow m}(t_j,t_{j+1})\nonumber \\&\quad =\frac{P_1(t_j)}{P_1(t_j)+P_2(t_j)}P_{1\rightarrow m}(t_j,t_{j+1})\nonumber \\&\qquad +\frac{P_2(t_j)}{P_1(t_j)+P_2(t_j)}P_{2\rightarrow m}(t_j,t_{j+1}),\ \end{aligned}$$
(27)

where \(P_1(t_j)\) and \(P_2(t_j)\) are solution of Eq. (8), and \(P_{1\rightarrow m}\) and \(P_{2\rightarrow m}\) can be obtained from Eq. (18). Next, we arrange the dataset vector \(\varvec{\mathcal {D}}(t_j)\) in the form \(\varvec{\mathcal {D}}(t_j)=\left[ \mathcal {D}_{1\vee 2}(t_j),\mathcal {D}_{3}(t_j),\mathcal {D}_{4}(t_j),\mathcal {D}_{5}(t_j)\right] \) to, respectively, represent the instantaneous number of compound state, instantaneous number of infected, accumulative number of recovered, and accumulative number of dead. Let \(\varDelta \varvec{\mathcal {D}}(t_j,t_{j+1}):=|\varvec{\mathcal {D}}(t_{j+1})-\varvec{\mathcal {D}}(t_j)|\) represent the absolute difference between two consecutive dataset vectors.

Before introducing the likelihood function, we introduce an additional assumption that the \(\varDelta \mathcal {D}_{1\vee 2}(t_j,t_{j+1})\) number of Markov chains all transit to the state 3 (the infected). This assumption can be always (made) correct since: (i) if \(t_j\) is sufficiently close to \(t_{j+1}\), naturally one cannot jump to the recovered/dead state from susceptible/exposed; (ii) if \(t_{j+1}-t_j\) is large, one could re-mesh the timescale and perform interpolation on the dataset, so that \(t_j\) can always be close to \(t_{j+1}\) by construction. Finally, the conditional likelihood function \(\mathcal {L}(\varvec{\mathcal {D}}(t_{j+1})|\varvec{\mathcal {D}}(t_j);{\varvec{\theta }})\) can be written as

$$\begin{aligned} \mathcal {L}(\varvec{\mathcal {D}}(t_{j+1})|\varvec{\mathcal {D}}(t_j);{\varvec{\theta }})=\mathcal {L}_1\mathcal {L}_2,\ \end{aligned}$$
(28)

where

$$\begin{aligned} \mathcal {L}_1= & {} \frac{\mathcal {D}_{1\vee 2}!}{(\mathcal {D}_{1\vee 2}-\varDelta \mathcal {D}_{1\vee 2})!\varDelta \mathcal {D}_{1\vee 2}!0!0!}\nonumber \\&(P_{1\vee 2\rightarrow 1\vee 2})^{\mathcal {D}_{1\vee 2}-\varDelta \mathcal {D}_{1\vee 2}}\nonumber \\&(P_{1\vee 2\rightarrow 3})^{\varDelta \mathcal {D}_{1\vee 2}}(P_{1\vee 2\rightarrow 4})^{0}(P_{1\vee 2\rightarrow 5})^{0} \end{aligned}$$
(29)

and

$$\begin{aligned} \mathcal {L}_2= & {} \frac{\mathcal {D}_{3}!}{(\mathcal {D}_{3}-\varDelta \mathcal {D}_{4} -\varDelta \mathcal {D}_{5})!\varDelta \mathcal {D}_{4}!\varDelta \mathcal {D}_{5}!}\nonumber \\&(P_{3\rightarrow 3})^{\mathcal {D}_{3}-\varDelta \mathcal {D}_{4}-\varDelta \mathcal {D}_{5}}\nonumber \\&(P_{3\rightarrow 4})^{\varDelta \mathcal {D}_{4}}(P_{3\rightarrow 5})^{\varDelta \mathcal {D}_{5}}. \end{aligned}$$
(30)

In Eqs. (29) and (30), the notations are simplified to drop \(t_j\), \(t_{j+1}\) and \({\varvec{\theta }}\). To avoid possible ambiguity, the simplification rules are: \(\mathcal {D}_{m}\equiv \mathcal {D}_{m}(t_j)\), \(\varDelta \mathcal {D}_{m}\equiv \varDelta \mathcal {D}_{m}(t_j,t_{j+1})\), \(P_{m}\equiv P_{m}(t_j|{\varvec{\theta }})\), and \(P_{m\rightarrow m'}\equiv P_{m\rightarrow m'}(t_j,t_{j+1}|{\varvec{\theta }})\). Substituting Eqs. (28), (29) and (30) into (17), one obtains the complete likelihood function.

4.3 Transmission measures

To obtain the entropy-based measures, we reshape the five-dimensional vector \(\varvec{P}\) into \([P_{1\vee 2},P_{3\vee 4},P_{5}]^\top \), and the corresponding stochastic matrix \(\varvec{S}\) is also reshaped (via the total probability theorem) accordingly. The reason to consider the compound state \(1\vee 2\) is because as a whole the state \(1\vee 2\) is an observable, therefore the possible errors in identifying the exposed can be marginalized out. The reason to consider the compound state \(3\vee 4\) is discussed in Sect. 3, and as a result, the concept “high recovery probability is preferable over high death probability” is correctly incorporated.

In addition to the entropy-based measures, using the next-generation matrix approach [35], the instantaneous reproductive ratio of the modified SEIR model can be defined as

$$\begin{aligned} R_0(t):=\frac{\alpha _1(t)}{\alpha _3(t)}+\frac{\alpha _2(t)}{\alpha _4(t)+\alpha _5(t)}.\ \end{aligned}$$
(31)

Note that the instantaneous reproductive ratio \(R_0(t)\) can be understood as the basic reproductive ratio of a tangent model defined as a constant model with parameters \({\varvec{\alpha }}\) equal to the instantaneous parameters \({\varvec{\alpha }}(t)\) at the reference time point t .

4.4 Modelling and computational details

As aforementioned, the whole parameter set \({\varvec{\theta }}\) not only involves parameters of the rate matrix \(\varvec{Q}(\varvec{P}(t),t)\), i.e. \({\varvec{\alpha }}(t)\) (represented by \(\varvec{w}, \varvec{w}'\) of basis functions), but also parameters to represent the initial state, i.e. \({\varvec{\beta }}\). In the modelling practice, except \(\alpha _3\), which is related to the mean incubation period, we calibrate all the other parameters (including the initial conditions) with Bayesian analysis. The mean incubation period, which is \(1/\alpha _3\) in the model, is reported in various previous studies [11, 36], and typically, it is around 5 and in the range of [3, 7] days. Therefore, we set \(1/\alpha _3\) as an epistemic random variable within [3, 7]. The time-dependent parameters are modelled with sigmoid basis functions. The number of function basis for each parameter is determined in an additive manner. Specifically, we start with constant \(\alpha \) and iteratively increase the number of basis functions until the variation in likelihood function value (Eq. (17)) or BIC index becomes small.

The Gibbs sampling with a uniform proposal distribution for each component of \({\varvec{\theta }}\) is adopted to sample from the posterior distribution. The step size of the Gibbs sampling is adaptively tuned using the acceptance rate of the Markov chain [37, 38]. The seed samples for the Gibbs sampler are selected in the neighbourhood of the posterior mode. This is obtained by sequential Monte Carlo method [39, 40] combined with deterministic trust region optimization [41, 42].

Fig. 2
figure 2

Raw and corrected datasets of Hubei province. There are two policy changes regarding the dataset: (i) in February 12, 2020, the diagnosis criterion was temporarily relaxed, and as a result, there is an artificial jump in the number of infected; (ii) in April 17, 2020, the cumulative number of infected and dead is altered by a constant jump, and the cumulative number of recovered is altered by a constant drop. The jumps/drops are marked by a rectangular in the figure. The populations of infected, recovered, and dead are corrected using Eq. (32). Note that for infected the correction is made on cumulative numbers, and then, the instantaneous infected is obtained by subtracting the accumulative recovered and dead

Fig. 3
figure 3

The entropy rates for various regions. The figure shows the temporal evolution of entropy rate for various regions. The solid lines correspond to the posterior mean estimations, and the shaded areas correspond to \(\lbrace 10\%,20\%,\ldots ,99\%\rbrace \) credible intervals (around the posterior mean)

Fig. 4
figure 4

The entropies and concentration factors for various regions. The figure shows a comparison of total entropies and concentration factors for various regions, with the violins illustrating the posterior distribution

5 Modelling results on real datasets of COVID-19

5.1 Datasets

For the studied regions, the time series of the populations of infected, recovered, and dead during January to May 2020 are used in model calibration. The data are collected from WHO, European CDC and Chinese CDC [11, 43, 44]. The regions considered in this study include: Hubei province, South Korea, Italy, Germany, Spain, and France.Footnote 9 For each region, the population size N is fixed to the most recent value reported by Worldometer [45]. We choose these countries/regions because they have the same order of population size (this is irrelevant to the entropy-based measure which is N independent), they applied different containment strategies, and they represent different cultures. Moreover, at the time of writing of this article the peak of the epidemic waves is passed. A complete and thorough analysis of a large number of regions is out of the scope of the current study. In fact, here we focus primarily on the model and metric definition and their use.

5.2 Data correction

Due to abrupt counting policy changes and various corrections, the COVID-19 datasets for Hubei, Spain, and France not only violate the smoothness assumptionFootnote 10 of the proposed modelling framework, but also contradict the fundamental fact that the accumulative number can only be non-decreasing. Therefore, the datasets must be corrected. It is obvious that the cluster/jump of data has a missing information, which is the (correct) time of occurrence. To obtain a consistent dataset we fill this missing information by using the expected time of occurrence with respect to the distribution of the previous events. Since the dataset is recorded daily, marginally they form a multinomial distribution along the discrete time axis. It follows that the missing time information can be filled by using the daily expected number of events. Specifically, let \(t_J\) represent the time point when the jump/drop happens (for a specified compartment), and let \(\varDelta \mathcal {D}_J\) represent the magnitude of the data jump/drop. We perform a postprocess of the dataset expressed as follows:

$$\begin{aligned} \mathcal {D}(t_i)\leftarrow \left( 1+\frac{\varDelta \mathcal {D}_J}{\mathcal {D}(t_{J})}\right) \mathcal {D}(t_i),\ \end{aligned}$$
(32)

where \(t_i=t_0,t_1,\ldots ,t_{J}\), and \(\mathcal {D}(t_i)\) represents the cumulativeFootnote 11 number at \(t_i\). Note that \(\varDelta \mathcal {D}_J\) could be negative. For an illustration of the correction, Fig. 2 shows the raw and the corrected datasets for Hubei province.

5.3 The overall epidemic dynamics of various regions

After performing model calibrations on datasets of various regions, we present a comparison analysis based on various transmission measures. Figure 3 shows the evolution of the entropy rate of COVID-19 outbreaks for each of the regions considered. This graph represents the time evolution of the degree of disorder (in terms of infections and deaths) introduced by the virus in an average statistical individual of the region. This graph reflects features of the daily evolution of infection and recovered/deaths, but it is fundamentally different from the evolution of each compartment. In fact, it has the key property of being objective and comparable between regions. Interestingly, the evolution of the entropy rate has a similar form for each region, but a significant difference in the magnitude of the disorder. In particular, the cumulative integral of the entropy rate represents the change of entropy in the system and, therefore, the total impact in a region. In Fig. 4—top panel, we report this impact measure for each of the regions considered. Based on this metric, Spain was the most affected region despite the epidemic wave hit the country later than Italy. On the opposite side, South Korea is the country with the least change in entropy, highlighting an effective combination of policies and cultural habits that limited the impact of the epidemic. This is probably due to the experience gained during the recent 2015 Middle East Respiratory Syndrome coronavirus (MERS-CoV) outbreak [46]. Also, Hubei’s reaction, with extreme containment measures, has overall limited the impact of the epidemic. Germany has the smallest total entropy among studied European countries.

Interestingly, the peak of entropy rate for Spain, Italy, and Germany occurred in about the same period but with a different left tail behaviour (i.e. in the growing phase). On the other hand, the behaviour of the right tail (i.e. the descent phase) is similar, showing a fatter and longer tail. A similar asymmetry can also be observed in Hubei and South Korea. A deviation from this “classic” behaviour is represented by Hubei, which does not show this long tail behaviour but has a rather compact and almost symmetric shape. A surprising result is shown in Fig. 4—bottom panel. Although the impact in each country is significantly different, the concentration factor is similar to support the fact that the evolution of COVID-19 is similar for all outbreaks. The Hubei region is slightly deviating from this trend, showing a higher concentration factor corroborating the lack of a right fat tail and, therefore, showing a higher prevalence as an impulse.

Figure 5 shows a comparison of the instantaneous reproductive ratio and death rate, together with the date of lockdown in each region. One can infer that the lockdown reduced \(R_0(t)\) effectively. However, surprisingly, the most effective decrease has been observed in South Korea where no national lockdown has been implemented, but only local containment measures, and massive early-stage testing.

It is important to note that the modelling results are associated with the optimized parsimonious model for each region. Specifically, in an optimized parsimonious model the number of time-dependent variables as well as the number of adaptive basis functions for each time-dependent variable is optimized, in the sense that increasing the number would not noticeably improve the calibration accuracy and decreasing the number would significantly degrade the accuracy. Finally, for an illustration on the degree of accuracy the model has achieved, the model calibration results of Hubei are shown in Fig. 6. The calibration for the other countries and their limitations are reported in B.

Fig. 5
figure 5

The instantaneous reproductive ratio, recovery, and death rates for various regions. The lockdown date for each region is shown as vertical dashed line. Note that South Korea does not have a lockdown policy

Fig. 6
figure 6

Modelling the overall epidemic dynamics of Hubei province with the modified SEIR model. The red line corresponds to the posterior mean estimation. The shaded area corresponds to \(\lbrace 10\%,20\%,\ldots ,99\%\rbrace \) credible intervals around the posterior mean. The parameters \(\alpha _1(t)\), \(\alpha _4(t)\) and \(\alpha _5(t)\) are modelled with a single sigmoid basis, and \(\alpha _2\) is modelled with a constant variable. The \(n_{\epsilon }\) in Eq. (21) is fixed to 100, assuming the error in the increment of infected is of the order of a few hundreds. The figure suggests a highly accurate calibration on data using at most one adaptive basis function for each parameter

5.4 Robustness on the transmission trend

A natural concern regarding the discovered transmission trend is that if the trend is a genuine underlying structure of the epidemic, or it is merely some artificial/superficial structures from the specific time-dependent model. It is challenging to (perfectly) resolve this concern because a compartmental model (or any mathematical model) is inevitably an approximation on the real epidemic. Moreover, even an exact model exists, it is still challenging (if not impossible) to accurately identify the model due to the presence of endogenous variables. However, at least we could show that the proposed framework is self-consistent. In C, we simulate artificial epidemics from analytical SEIR laws and investigate whether the proposed modelling approach could identify correct transmission trends.

6 Limitations and future research directions

6.1 Incorporating the undetected cases

In Sect. 5, the reported/observed population in each compartment is used to calibrate the model, and the kernel function in Eq. (21) only flattens the likelihood function instead of altering its intrinsic shape. Consequently, the model describes an epidemic scenario consistent with but also confined by the reported cases. An important missing issue to address is to incorporate the undetected cases to fully uncover the magnitude of the epidemic. A practical modelling strategy is to introduce a probability distribution assumption on the (possibly time-dependent) ratio between reported and undetected cases and rewrite the likelihood function similar to Eq. (19). Clearly, the critical ingredient is the model assumption on the undetected. The ongoing studies on blood test for antibodies of SARS-CoV-2 [47] can be useful for this future research direction.

6.2 Application to more complex compartmental models

Depending on the modelling purposes, one could introduce additional compartments, e.g. the tested/suspected, the ICU case, the female and male, the old and young, etc., to study the interactions between different groups. It is also straightforward to include spatial distributed information by including adjacency and incidence matrices. However, one should be aware that the model variance and the possibility of converging to local insignificant likelihood modes in general would increase with model complexity. Therefore, it would be crucial to collect robust prior knowledge regarding the modelling parameters.

Fig. 7
figure 7

Illustration of entropy-based measures for systems with the same basic reproductive ratio. The final stage of \(\varvec{P}(t)\) for the two systems is identical since they have an identical reproductive ratio. However, System 2 evolves more rapidly than System 1, and this is reflected by the entropy rate and concentration factor. System 1 has a longer active period than System 2, and consequently, the total entropy of System 1 is larger

7 Conclusions

In this study, we have proposed a stochastic compartmental modelling framework of epidemics equipped with entropy-based metrics to measure both the impact and the evolution of a pandemic event. The model belongs to the nonlinear Markov processes class, which allows a robust formulation and a natural setting for developing entropy-based metrics. In addition, we have provided a complete Bayesian inversion scheme to calibrate the model parameters with related uncertainties. Subsequently, we specialized the proposed structure to a modified SEIR model and the COVID-19 pandemic. In particular, we used the framework to investigate six regions: Hubei, South Korea, Italy, Spain, Germany, and France. We showed that the change in entropy in the selected areas (which is associated with the impact of an epidemic) is significantly different. However, it is surprising to note that the dynamic evolution of pandemic waves shows very regular trends and very similar concentration measures.