1 Introduction

Started from Wuhan, the capital of Hubei province, China in December 2019, the outbreak of 2019 novel coronavirus disease (COVID-19) has spread rapidly across more than 200 countries, areas or territories in a short period of time with so far (as of January 20, 2021) over 94.96 million confirmed cases and 2.05 million confirmed deaths (World Health Organization 2021).

The spread of COVID-19 across and within countries has not followed a homogeneous pattern (Giuliani et al. 2020). The causes of this heterogeneity are not yet clearly identified, but different countries have different levels of national capacity based on their abilities in prevention, detection, response strategies, enabling function, and operational readiness (Kandel et al. 2020). Besides, different countries have implemented different levels of rigorous quarantine and control measures to prevent and contain the epidemic, which affect the population movement and hence the spread pattern of COVID-19. Given the highly contagious nature of COVID-19, the spatial pattern of the spread of the disease changes rapidly over time. Thus, understanding the spatio-temporal dynamics of the spread of COVID-19 in different countries is undoubtedly critical.

The spatial or geographical distribution of relative location of incidence (new cases) of COVID-19 in a country is important in the analyses of the disease risk across the country. In disease mapping studies, the spatial domain of interest is partitioned into a number of contiguous smaller areas, usually defined by administrative divisions such as provinces, counties, municipalities, towns or census tracts, and the aim of the study is to estimate the relative risk of each area at different times (Lee 2011; Lawson 2018). Spatio-temporal models are then required to explain and predict the evolution of incidence and risk of the disease in both space and time simultaneously (Anderson and Ryan 2017).

Estimation of area-specific risks over time provides information on the disease burden in specific areas and identifies areas with elevated risk levels (hot spots). In addition, identifying the changes in the spatial patterns of the disease risk over time may result in detecting either regional or global trends, and contributes to make informed and timely public health resource allocation (Wakefield 2007).

To account for the underlying temporal and spatial autocorrelation structure in the spread of COVID-19, available data on the daily number of new cases and deaths in different countries/regions have already been analyzed in a considerable number of studies. For example, Kang et al. (2020) used Moran’s I spatial statistic with various definitions of neighbors and observed a significant spatial association of COVID-19 in daily number of new cases in provinces of mainland China. Gayawan et al. (2020) used a zero-inflated Poisson model for the daily number of new COVID-19 cases in the African continent and found that the pandemic varies geographically across Africa with notable high incidence in neighboring countries. Briz-Redón and Serrano-Aroca (2020) conducted a spatio-temporal analysis for exploring the effect of daily temperature on the accumulated number of COVID-19 cases in the provinces of Spain. They found no evidence suggesting a relationship between the temperature and the prevalence of COVID-19 in Spain. Gross et al. (2020) studied the spatio-temporal spread of COVID-19 in China and compare it to other global regions and concluded that human mobility/migration from Hubei and the spread of COVID-19 are highly related. Danon et al. (2020) combined 2011 census data to capture population sizes and population movement in England and Wales with parameter estimates from the outbreak in China, and found that the COVID-19 outbreak is going to peak around 4 months after the start of person-to-person transmission. Using linear regression, multilayer perceptron and vector autoregression, Sujath et al. (2020) modeled and forecasted the spread of COVID-19 cases in India.

As pointed out in Alamo et al. (2020), there are many national and international organizations that provide open data on the number of confirmed cases and deaths. However, these data often suffer from incompleteness and inaccuracy, which are considerable limitations for any analyses and modeling conducted on the available data on COVID-19 (Langousis and Carsteanu 2020). We highlight that we are yet in the center of the pandemic crisis and due to the public health problem, and also to the severe economical situation, we do not have access to all sources of data. Thus reseachers know only a portion of all the elements related to COVID-19. In addition, data on many relevant variables such as population movement and interaction, and the impact of quarantine and social distancing policies are not either available or accurately measured. Combined with the unknown nature of the new COVID-19 virus, any analysis such as the present study only provides an approximate and imprecise description of the underlying spatio-temporal dynamics of the pandemic. Nevertheless, having a vague idea is better than having no idea, and the results should be interpreted with caution.

Currently, a wealth of studies have appeared in the very recent literature. Many of them follow the compartmental models in epidemiology, partition the population into subpopulations (compartments) of susceptible (S), exposed (E), infectious (I) and recovered (R), and fit several variations of the classical deterministic SIR and SEIR epidemiological models (Peng et al. 2020; Roda et al. 2020; Bastos and Cajueiro 2020). We believe that considering stochastic components is important, if not essential, to explain the complexity and heterogeneity of the spread of COVID-19 over time and space. For this reason, in the present work we propose a spatio-temporal stochastic modeling approach that is able to account for the spatial, temporal and interactions effects, together with possible deterministic covariates.

We acknowledge that the proposed model in its current form requires development and refinements as more information becomes available, but at the stage of the pandemic we are now, it can provide a reasonable modeling framework for the spatio-temporal spread of COVID-19. This is illustrated by modeling the daily number of new confirmed cases in Spain, Italy and Germany from late February 2020 to mid January 2021. The R code for implementing the proposed model can be made available upon request. We also provide a Shiny web application (Chang et al. 2020) based on the model discussed in this paper at https://ajalilian.shinyapps.io/shinyapp/.

The structure of the paper is the following. The open data resources used in this study are introduced in Sect. 2. A model for the daily number of regional cases is considered in Sect. 3. As described in Sect. 4, this model explains the spatio-temporal variations in the relative risk of each country in terms of a number of temporal, spatial and spatio-temporal random effects. The results of fitting the considered model to the number of daily confirmed cases in Spain, Italy and Germany are given in Sect. 5. The paper concludes in Sect. 6 with some few last remarks.

2 Data on the daily number of COVID-19 cases

Governmental and non-governmental organizations across the world are collecting and reporting regional, national and global data on the daily number of confirmed cases, deaths and recovered patients and provide open data resources. Incompleteness, inconsistency, inaccuracy and ambiguity of these open data are among limitations of any analysis, modeling and forecasting based on the data (Alamo et al. 2020). Particularly, the number of cases mainly consist of cases confirmed by a laboratory test and do not include infected asymptomatic cases and infected symptomatic cases without a positive laboratory test.

In this study, we focus on the daily number of confirmed cases in Spain, Italy and Germany and used the following open data resources.

Spain::

DATADISTA, a Spanish digital communication medium that extracts the data on confirmed cases (PCR and antibody test) registered by the Autonomous Communities of Spain and published by the Ministry of Health and the Carlos III Health Institute. DATADISTA makes the data available in an accessible format at the GitHub repository https://github.com/datadista/datasets/tree/master/COVID The daily accumulated number of total confirmed cases registered in the 19 Autonomous Communities of Spain are updated by DATADISTA on a daily bases.

Italy::

Data on the daily accumulated number of confirmed cases in the 20 regions of Italy are reported by the Civil Protection Department (Dipartimento della Protezione Civile), a national organization in Italy that deals with the prediction, prevention and management of emergency events. These data are available at the GitHub repository https://github.com/pcm-dpc/COVID-19 and are being constantly updated.

Germany::

The Robert Koch Institute, a federal government agency and research institute responsible for disease control and prevention, collects data and publishes official daily situation reports on COVID-19 in Germany. Data on the daily accumulated number of confirmed cases in the 16 federal states of Germany extracted from the situation reports of the Robert Koch Institute are available at the GitHub repository https://github.com/jgehrcke/covid-19-germany-gae and are being updated on a daily basis.

Table 1 Considered countries with their corresponding number of regions (m), length of study period and the estimated country-wide daily incidence rate (\({\widehat{\varrho }}_{0}\))

Table 1 summarizes the number of regions, study period and country-wide daily incidence rate of the data for each country.

Data on distribution population of the considered countries are extracted from the Gridded Population of the World, Version 4 (GPWv4), which provides estimates of the number of persons per pixel (1\(^{\circ }\) resolution) for the year 2020 (Center International Earth Science Information Network (CIESIN) Columbia University 2018). These data are consistent with national censuses and population registers.

Fig. 1
figure 1

Boxplots of the daily number of COVID-19 cases \(y_{i1},\ldots ,y_{iT}\) (left) and population \(P_{i}\) (right) in each region (first level administrative division), \(i=1,\ldots ,m\), of Spain, Italy and Germany

Fig. 2
figure 2

Boxplots (yellow boxes with black whiskers) of logarithm of the regional number of COVID-19 cases \(\log (1 + y_{1t}),\ldots ,\log (1+y_{mt})\) in each day \(t=1,\ldots ,m\) for Spain, Italy and Germany. The red solid line connects the medians of boxplots and depicts the country-wide temporal trends and variations, and the blue dashed line is the overall median of all cases during the study period

3 Modeling daily regional counts

Suppose that a country, the spatial domain of interest, is partitioned into regions \(A_{1},\ldots ,A_{m}\), defined by administrative divisions such as states, provinces, counties, etc (see Table 1). Let \(Y_{it}\) denote the number of new COVID-19 cases in region \(A_{i}\), \(i=1,\ldots ,m\), at time (day) \(t=1,\ldots ,T\).

Figure 1 shows boxplots of the observed number of new cases \(Y_{i1},\ldots ,Y_{iT}\) for each region \(i=1,\ldots ,m\). Clearly, observed values of \(Y_{i1},\ldots ,Y_{iT}\) are heavily right-skewed and directly related to the population of region \(A_{i}\). Boxplots of the observed values of \(\log (1+Y_{1t}),\ldots ,\log (1+Y_{mt})\) for each day \(t=1,\ldots ,m\) are shown in Fig. 2. The roughly symmetric boxplots in the logarithmic scale imply right skewness of the regional counts at each day. Figure 2 also depicts temporal trends and variations of the observed number of COVID-19 cases which are due to restrictions and lockdowns imposed by national or regional government authorities and other relevant factors.

Thus, it is reasonable to assume that the expected number of new COVID-19 cases in region \(A_{i}\) at time t is given by

$$\begin{aligned} E_{it} = {\mathbb{E}}[Y_{it}] = P_{i} \varrho _{it}, \quad i=1,\ldots ,m,t=1,\ldots ,T, \end{aligned}$$

where \(P_{i}\) is the population of region \(A_{i}\) and \(\varrho _{it}>0\) is the incidence rate of COVID-19 in region \(A_{i}\) at time t.

3.1 The null model of homogeneous incidence rates

Under the null model of spatial and temporal homogeneity of the incidence rate, we have that \(\varrho _{it}=\varrho _{0}\), and

$$\begin{aligned} {\widehat{E}}_{it} = P_{i}{\widehat{\varrho }}_{0}, \end{aligned}$$
(1)

provides an estimate for \(E_{it}\), where

$$\begin{aligned} {\widehat{\varrho }}_{0} = \frac{1}{T} \sum _{t=1}^{T} \frac{\sum _{i=1}^{m} Y_{it}}{\sum _{i=1}^{m} P_{i}} \end{aligned}$$

is an estimate of the country-wide homogeneous daily incidence rate (Waller et al. 1997). The estimated daily incidence rate per million population (\(10^{6}{\widehat{\varrho }}_{0}\)) so far is around 156, 123 and 79 for Spain, Italy and Germany, respectively (see Table 1).

3.2 Distribution of daily regional counts

Consul and Jain (1973) introduced a generalization of the Poisson distribution, which is a suitable model to most unimodal or reverse J-shaped counting distributions with long right tails. In fact, the generalized Poisson distribution is a mixture of Poisson distributions and, compared to the ordinary Poisson and negative binomial distributions, it is a more flexible model for heavily right-skewed count data (Joe and Rong 2005; Zamani and Ismail 2012). Given nonnegative random rates \(\varLambda _{it}\), \(i=1,\ldots ,m\), \(t=1,\ldots ,T\), following Zamani and Ismail (2012), we write

$$\begin{aligned} \Omega _{it}=\frac{\varLambda _{it}}{1 + \varphi \varLambda _{it}^{\alpha -1}}, \quad \Psi _{it} = \frac{\varphi \varLambda _{it}^{\alpha - 1}}{1 + \varphi \varLambda _{it}^{\alpha - 1}}, \end{aligned}$$

where \(\varphi \ge 0\) and \(\alpha \in {\mathbb R}\), and assume that \(Y_{it}\)’s are independent random variables following the generalized Poisson distribution with

$$\begin{aligned}&{\mathbb P}(Y_{it}=y_{it}|\varLambda _{it}, \varphi , \alpha )\\&\quad =\exp \left( -\Omega _{it} - \Psi _{it} y_{it} \right) \frac{\Omega _{it} \left( \Omega _{it} + \Psi _{it} y_{it} \right) ^{y_{it} - 1}}{y_{it}!}, \quad y_{it} =0,1,\ldots , \end{aligned}$$

Then,

$$\begin{aligned}&{\mathbb P}(Y_{it}=y_{it}|\varLambda _{it},\varphi ,\alpha ) \\&\quad = \exp \left( -\frac{\varLambda _{it} + \varphi \varLambda _{it}^{\alpha -1} y_{it}}{1 + \varphi \varLambda _{it}^{\alpha -1}} \right) \frac{\varLambda _{it} \left( \varLambda _{it} + \varphi \varLambda _{it}^{\alpha -1} y_{it} \right) ^{y_{it}-1}}{\left( 1 + \varphi \varLambda _{it}^{\alpha -1} \right) ^{y_{it}}y_{it}!} \end{aligned}$$

and

$$\begin{aligned} {\mathbb{E}}[Y_{it}|\varLambda _{it},\varphi ,\alpha ] = \varLambda _{it}, \quad {\mathbb {V}\mathrm{ar}}(Y_{it}|\varLambda _{it},\varphi ,\alpha ) = \varLambda _{it} \left( 1 + \varphi \varLambda _{it}^{\alpha -1}\right) ^{2}. \end{aligned}$$

Thus \(\varphi \) is the dispersion parameter and the case \(\varphi =0\) represents the ordinary Poisson distribution (no dispersion) with

$$\begin{aligned} {\mathbb P}\left( Y_{it} = y_{it} | \varLambda _{it} \right) = \exp \left( -\varLambda _{it}\right) \frac{\varLambda _{it}^{y_{it}}}{y_{it}!}, y_{it}=0,1,\ldots . \end{aligned}$$

Here, parameter \(\alpha \) controls the shape (power) of the relation between the conditional variance of \(Y_{it}|\varLambda _{it}\) and its conditional mean. For example, the relation between \({\mathbb {V}\mathrm{ar}}(Y_{it}|\varLambda _{it})\) and \({\mathbb{E}}[Y_{it}|\varLambda _{it}]\) is linear if \(\alpha =1\), quadratic if \(\alpha =1.5\) and cubic if \(\alpha =2\) (Zamani and Ismail 2012).

4 Modeling relative risks

The underlying random rates \(\varLambda _{it}\), \(i=1,\ldots ,m,t=1,\ldots ,T\), account for the extra variability (overdispersion), which may represent unmeasured confounders and model misspecification (Wakefield 2007). Variations of the random rate \(\varLambda _{it}\) relative to the expected number of cases \(E_{it}\) provide useful information about the spatio-temporal risk of COVID-19 in the whole spatial domain of interest during the study period.

4.1 Relative risks

In disease mapping literature, the nonnegative random quantities

$$\begin{aligned} \theta _{it} = \frac{{\mathbb{E}}[ Y_{it} | \varLambda _{it}]}{{\mathbb{E}}[Y_{it}]} = \frac{\varLambda _{it}}{E_{it}}, \quad i=1,\ldots ,m, \end{aligned}$$

are called the area-specific relative risks at time t ( Lawson 2018, Section 5.1.4). Note that the random rate \(\varLambda _{it}\) and the expected value \(E_{it}\) have the same dimension (unit is people per day) as \(Y_{it}\) and hence \(\theta _{it}\) is a dimensionless random latent variable. Since

$$\begin{aligned} {\mathbb{E}}[\varLambda _{it}]={\mathbb{E}}\left[ {\mathbb{E}}[Y_{it} \big | \varLambda _{it},\varphi ,\alpha ] \right] ={\mathbb{E}}[Y_{it}]=E_{it}, \end{aligned}$$

it follows that \({\mathbb{E}}[\theta _{it}] = {\mathbb{E}}[\varLambda _{it}] / E_{it} = 1\) and

$$\begin{aligned} {\mathbb Cov}( \theta _{it}, \theta _{i^{\prime }t^{\prime }}) = \frac{{\mathbb Cov}( \varLambda _{it}, \varLambda _{i^{\prime }t^{\prime }})}{E_{it}E_{i^{\prime }t^{\prime }}}, \end{aligned}$$

which means that the temporal and spatial correlation structure of the underlying random rates \(\varLambda _{it}\) determine the spatio-temporal correlations between \(\theta _{it}\)’s.

By ignoring these correlations, the standardized incidence ratio \({\widehat{\theta }}_{it}=Y_{it}/{\widehat{E}}_{it}\) provides a naive estimate for the relative risks (Lee 2011). However, in a model-based approach the variations of the relative risks are often related to regional and/or temporal observed covariates and the correlation between \(\theta _{it}\)’s are explained in terms of regional and/or temporal random effects using, for example, a log linear model (Wakefield 2007; Lee 2011; Lawson 2018).

4.2 A model for relative risks

In the present study, we consider the log linear model

$$\begin{aligned} M_{0}: \quad \theta _{it} = \exp \left( \mu + \beta d_{i} \right) \end{aligned}$$
(2)

as the null model, where \(\mu \) is the intercept and \(d_{i}\) is the population density of region \(A_{i}\), i.e. the population of \(A_{i}\), \(P_{i}\), divided by the area of \(A_{i}\). The population density is standardized to have mean 0 and variance 1 and \(\beta \) is its regression coefficient. However, Figs. 1 and 2 demonstrate spatio-temporal variations in daily number of COVID-19 cases which are not accounted for in the null model (2). We include zero mean random effects in the null model to account for spatio-temporal variations in relative risks due to temporal and spatial trend and correlation. Among many different possibilities, we consider the following models

$$\begin{aligned} M_{1}: \quad&\theta _{it} = \exp \left( \mu + \beta d_{i} + \delta _{t} \right) , \\ M_{2}: \quad&\theta _{it} = \exp \left( \mu + \beta d_{i} + \delta _{t} + \xi _{i} \right) , \\ M_{3}: \quad&\theta _{it} = \exp \left( \mu + \beta d_{i} + \delta _{t} + \varepsilon _{t} + \xi _{i} \right) , \\ M_{4}: \quad&\theta _{it} = \exp \left( \mu + \beta d_{i} + \delta _{t} + \varepsilon _{t} + \xi _{i} + \zeta _{i} \right) , \end{aligned}$$

where \(\delta _{t}\) represents the temporal trend, \(\varepsilon _{t}\) accounts for temporal correlation and \(\xi _{i}\) and \(\zeta _{i}\) explain spatial correlation due to spatial distance and neighborhood relations among regions \(A_{1},\ldots ,A_{m}\), respectively (see Table 2). Here, \(\delta _{t}\), \(\varepsilon _{t}\), \(\xi _{i}\) and \(\zeta _{i}\) are dimensionless random effects that explain temporal and spatial variability in the relative risk \(\theta _{it}\).

Table 2 Considered terms in the additive model for the spatio-temporal random effect of the log-linear model for relative risks

The latent (stochastic) temporal trend \(\delta _{t}\) is expected to be a smooth function of t. Since the second order random walk (RW2) model is appropriate for representing smooth curves (Fahrmeir and Kneib 2008), \({\varvec{\delta }}=(\delta _{1},\ldots ,\delta _{T})\) is assumed to follow a RW2 model, i.e.,

$$\begin{aligned} \Delta ^{2}\delta _{t+1} = \delta _{t+1} - 2 \delta _{t} + \delta _{t-1} =\epsilon _{t}, \quad t=2, \ldots , T-1, \end{aligned}$$

where \(\epsilon _{2},\ldots ,\epsilon _{T-1}\) are independent and identically distributed (i.i.d.) zero mean Gaussian random variables with variance \({1}/{\tau _{\delta }}\). Here the precision parameter \(\tau _{\delta }>0\) acts as a smoothing parameter enforcing small or allowing for large variations in \(\delta _{t}\) (Fahrmeir and Kneib 2008).

To account for temporal correlation, we assume that \(\varepsilon _{t}\) follows a stationary autoregressive model of order 2, AR(2); i.e.,

$$\begin{aligned} \varepsilon _{t} = \psi _{1} (1 - \psi _{2}) \varepsilon _{t-1} + \psi _{2} \varepsilon _{t-2} + \epsilon ^{\prime }_{t}, \quad t=2, \ldots , T, \end{aligned}$$

where \(-1<\psi _{1}<1\) and \(-1<\psi _{2}<1\) are the first and second partial autocorrelations of \(\varepsilon _{t}\) and \(\epsilon ^{\prime }_{2},\ldots ,\epsilon ^{\prime }_{T}\) are i.i.d. zero mean Gaussian random variables with variance \(1/\tau _{\varepsilon }\).

On the other hand, the neighborhood structure of regions \(A_{1},\ldots , A_{m}\) may induce spatial correlation among relative risks of regions because neighboring regions often tend to have similar relative risks. To include spatial correlation due to neighborhood structure of regions in the model, we assume that \({\varvec{\xi }}=(\xi _{1},\ldots ,\xi _{m})\) follows a scaled version of the Besag–York–Mollié (BYM) model (Besag et al. 1991), i.e., \({\varvec{\xi }}\) is a zero mean Gaussian random vector with (Riebler et al. 2016)

$$\begin{aligned} {\mathbb{V}}{\mathrm{ar}}({\varvec{\xi }}) = \frac{1}{\tau _{\xi }} \left( (1-\phi ) {\mathbf {Q}}^{-} + \phi {\mathbf {I}}_{m} \right) . \end{aligned}$$

Here \({\mathbf {Q}}^{-}\) denotes the generalized inverse of the \(m\times m\) spatial precision matrix \({\mathbf {Q}}=[Q_{ii^{\prime }}]\) with entries

$$ Q_{{ii^{\prime } }} = \left\{ {\begin{array}{*{20}l} {n_{i} } \hfill &\quad {i = i^{\prime } } \hfill \\ { - 1} \hfill &\quad {i\sim i^{\prime } } \hfill \\ 0 \hfill &\quad {{\text{otherwise}}} \hfill \\ \end{array} } \right. $$

where \(n_{i}\) is the number of neighbors of region \(A_{i}\) and \(i\sim i^{\prime }\) means that regions \(A_{i}\) and \(A_{i^{\prime }}\) share a common border. The parameter \(\tau _{\xi }>0\) represents the marginal precision and \(0\le \phi \le 1\) indicates the proportion of the marginal variance explained by the neighborhood structure of regions (Riebler et al. 2016).

In addition to neighborhood interconnectivity, to account for spatial correlations due to spatial distance we assume that \({\varvec{\zeta }}=(\zeta _{1},\ldots ,\zeta _{m})\) follows a Gaussian Markov random field (GMRF). More specifically, we assume that \({\varvec{\zeta }}\) is a zero mean Gaussian random vector with the structured covariance matrix

$$\begin{aligned} {\mathbb {V}\mathrm{ar}}({\varvec{\zeta }}) = \frac{1}{\tau _{\zeta }} \left( {\mathbf {I}}_{m} - \frac{\omega }{e_{\max }} {\mathbf {C}}\right) ^{-1} \end{aligned}$$

where \({\mathbf {I}}_{m}\) is the \(m\times m\) identity matrix, \(0\le \omega <1\) and \(e_{\max }\) is the largest eigenvalue of the \(m\times m\) symmetric positive definite matrix \({\mathbf {C}} = [{C}_{ii^{\prime }}]\). The entry \(C_{ii^{\prime }}\) of matrix \({\mathbf {C}}\) represents to what extend the regions \(A_{i}\) and \(A_{i^{\prime }}\) are interconnected. For example, \({C}_{ii^{\prime }}\) can be related to a data on commuting or population movement between regions \(A_{i}\) and \(A_{i^{\prime }}\). In absence of most recent and reliable movement data between the regions of Spain, Italy and Germany, we set \(C_{ii^{\prime }}\) to be the Euclidean distance between the centroids of \(A_{i}\) and \(A_{i^{\prime }}\).

4.3 Prior specification and implementation

In a Bayesian framework, it is necessary to specify prior distributions for all unknown parameters of the considered model. The Gaussian prior with mean zero and variance \(10^{6}\) is considered as a non-informative prior for the dispersion parameter of generalized Poisson distribution, \(\log \varphi \), and for the parameters of the log linear model for the relative risks \(\mu \), \(\beta \), \(\log \tau _{\delta }\), \(\log \tau _{\varepsilon }\), \(\log \tau _{\zeta }\), \(\log \tau _{\xi }\), \(\log \frac{\omega }{1-\omega }\), \(\log \frac{\phi }{1-\phi }\), \(\log \frac{1+\psi _{1}}{1-\psi _{1}}\) and \(\log \frac{1+\psi _{2}}{1-\psi _{2}}\). The prior distribution for the \(\alpha \) parameter of the generalized Poisson distribution is considered to be a Gaussian distribution with mean 1.5 and variance \(10^{6}\). Table 3 summarizes the model parameters and their necessary transformation for imposing the non-informative Gaussian priors.

Table 3 Parameters of the considered models \(M_{0} \subset M_{1}\subset \cdots \subset M_{4}\) for the daily number of new cases, their transformation for non-informative uniform (flat) priors and initial values

Since all random effects of the model are Gaussian, the integrated nested Laplace approximation (INLA) method (Rue et al. 2009) can be used for deterministic fast approximation of posterior probability distributions of the model parameters and latent random effects (Martins et al. 2013; Lindgren and Rue 2015). The R-INLA package, an R interface to the INLA program and available at www.r-inla.org, is used for the implementation of the Bayesian computations in the present work. The R code can be made available upon request. The initial values for all parameters in the INLA numerical computations are set to be the mean of their corresponding prior distribution. The initial value of \(\alpha \) is chosen to be 1.5 (see Table 3).

4.4 Bayesian model posterior predictive checks

Let \({\varvec{\vartheta }}\) denote the vector of all model parameters and

$$\begin{aligned} p(y_{it}|{\varvec{\vartheta }}) = {\mathbb P}\left( Y_{it}=y_{it}|{\varvec{\vartheta }}\right) \end{aligned}$$

be the likelihood function for the observed count \(y_{it}\). The deviance information criterion (DIC)

$$\begin{aligned} {\mathrm{DIC}} = -2 \sum _{i=1}^{m}\sum _{t=1}^{T} \left\{ 2 {\mathbb{E}}_{\mathrm{post}}\left[ \log p(y_{it}|{\varvec{\vartheta }}) \right] - \log p(y_{it}|\widehat{\varvec{\vartheta }}) \right\} , \end{aligned}$$

and the Watanabe–Akaike information criterion (WAIC)

$$\begin{aligned} {\mathrm{WAIC}} = -2\sum _{i=1}^{m}\sum _{t=1}^{T} \left\{ \log p(y_{it}|\widehat{\varvec{\vartheta }}) - {\mathbb {V}\mathrm{ar}}_{\mathrm{post}}\big ( \log p(y_{it}|{\varvec{\vartheta }}) \big ) \right\} \end{aligned}$$

are two widely used measures of overall model fit, where \(\widehat{\varvec{\vartheta }}\) is the posterior mode (Bayes estimate) of \({\varvec{\vartheta }}\) and \({\mathbb{E}}_{\mathrm{pots}}\) and \({\mathbb {V}\mathrm{ar}}_{\mathrm{pots}}\) denote expectation and variance with respect to the posterior distribution of \({\varvec{\vartheta }}\) (Gelman et al. 2014).

For count data \(Y_{it}\) and in a Bayesian framework, a probabilistic forecast is a posterior predictive distribution on \({\mathbb {Z}}_{+}\). It is expected to generate values that are consistent with the observations (calibration) and concentrated around their means (sharpness) as much as possible (Czado et al. 2009). Following a leave-one-out cross-validation approach, let

$$\begin{aligned} {\mathcal{B}}_{-(it)} = \bigcap _{i^{\prime }\ne i=1}^{m} \bigcap _{t^{\prime }\ne t=1}^{T} \left\{ Y_{i^{\prime }t^{\prime }}=y_{i^{\prime }t^{\prime }} \right\} \end{aligned}$$

be the event of observing all count values except the one for region \(A_{i}\) at time t. Dawid (1984) proposed the cross-validated probability integral transform (PIT)

$$\begin{aligned} {\mathrm{PIT}}_{it}(y_{it}) = {\mathbb {P}}\left( Y_{it} \le y_{it} \big | {\mathcal{B}}_{-(it)} \right) = {\mathbb{E}}_{\mathrm{post}}^{-(it)}\left[ {\mathbb P}\big (Y_{it}\le y_{it}|{\varvec{\vartheta }}\big ) \right] \end{aligned}$$

for calibration checks, where \({\mathbb{E}}_{\mathrm{post}}^{-(it)}\) denotes expectation with respect to the posterior distribution of \({\varvec{\vartheta }}\) based on the leave-one-out data \({\mathcal{B}}_{-(it)}\). Thus, \({\mathrm{PIT}}_{it}\) is simply the value that the predictive distribution function of \(Y_{it}\) attains at the observation point \(y_{it}\). The conditional predictive ordinate (CPO)

$$\begin{aligned} {\mathrm{CPO}}_{it}(y_{it}) = {\mathbb {P}}\left( Y_{it} = y_{it} \big | {\mathcal{B}}_{-(it)}\right) ={\mathbb{E}}_{\mathrm{post}}^{-(it)}\left[ p(y_{it}|{\varvec{\vartheta }}) \right] \end{aligned}$$

is another Bayesian model diagnostic. Small values of \({\mathrm{CPO}}_{it}(y_{it})\) indicate possible outliers, high-leverage and influential observations (Pettit 1990). Moreover, the Bayesian leave-one-out cross-validation

$$\begin{aligned} {\mathrm{BCV}} = \sum _{i=1}^{m}\sum _{t=1}^{T} \log {\mathrm{CPO}}_{it}(y_{it}) \end{aligned}$$
(3)

computes predictive accuracy (Gelman et al. 2014).

For count data, Czado et al. (2009) suggested a nonrandomized yet uniform version of the PIT with

$$\begin{aligned} F_{it}(u|y_{it}) = \left\{ \begin{array}{ll} 0, &{}\quad u< {\mathrm{PIT}}_{it}(y_{it} - 1), \\ \frac{u - {\mathrm{PIT}}_{it}(y_{it} - 1)}{{\mathrm{PIT}}_{it}(y_{it}) - {\mathrm{PIT}}_{it}(y_{it} - 1)}, &{}\quad {\mathrm{PIT}}_{it}(y_{it} - 1) \le u < {\mathrm{PIT}}_{it}(y_{it}), \\ 1, &{}\quad u \ge {\mathrm{PIT}}_{it}(y_{it}), \end{array} \right. \end{aligned}$$

which is equivalent to

$$\begin{aligned}&F_{it}(u|y_{it})\\&\quad = \left( 1 - \frac{{\mathrm{PIT}}_{it}(y_{it}) - u}{ {\mathrm{CPO}}_{it}(y_{it})} \right) {\mathbb {1}}[ 0 < {{\mathrm{PIT}}}_{it}(y_{it}) - u \le {\mathrm{CPO}}_{it}(y_{it})] \\&\quad \quad + {\mathbb {1}}[{{\mathrm{PIT}}}_{it}(y_{it}) - u \le 0], \end{aligned}$$

where \({\mathbb {1}}[\,\cdot \,]\) is the indicator function, because

$$\begin{aligned} {{\mathrm{PIT}}}_{it}(y_{it} - 1)&= {\mathbb {P}}\left( Y_{it} \le y_{it} - 1 \big | {\mathcal{B}}_{-(it)}\right) \\&= {\mathbb {P}}\left( Y_{it} \le y_{it} \big | {\mathcal{B}}_{-(it)}\right) - {\mathbb {P}}\left( Y_{it} = y_{it} \big | {\mathcal{B}}_{-(it)}\right) \\&= {{\mathrm{PIT}}}_{it}(y_{it}) - {\mathrm{CPO}}_{it}(y_{it}). \end{aligned}$$

The mean PIT

$$\begin{aligned} {\overline{F}}(u) = \frac{1}{mT} \sum _{i=1}^{m}\sum _{t=1}^{T} F_{it}(u|y_{it}), \quad 0\le u\le 1, \end{aligned}$$

can then be comparing with the standard uniform distribution for calibration. For example, a histogram with heights

$$\begin{aligned} {\overline{f}}_{j} = {\overline{F}}\left( \frac{j}{J}\right) - {\overline{F}}\left( \frac{j-1}{J}\right) , \quad j=1,\ldots ,J, \end{aligned}$$

and equally spaced bins \(\left[ (j-1)/J,j/J\right) \), \(j=1,\ldots ,J\), can be compared with its counterpart from the standard uniform distribution with \(f_{j}=1/J\). Any departure from uniformity indicates forecast failures and model deficiencies. As mentioned in Czado et al. (2009), U-shaped (reverse U-shaped) histograms indicate underdispersed (overdispersed) predictive distributions and when central tendencies of the predictive distributions are biased, the histograms are skewed.

5 Results

The DIC, WAIC and BCV criteria for the fitted nested models \(M_{0} \subset M_{1}\subset \cdots \subset M_{4}\) are presented in Table 4. Compared with the null model \(M_{0}\), it can be seen that including the smooth temporal trend \(\delta _{t}\) and spatial dependence due to neighborhood structure \(\xi _{i}\) in the models for all three countries results in considerable improvement in the model fit (smaller DIC and WAIC) and prediction accuracy (larger BCV). Except for DIC criterion for Germany, presence of small scale temporal trends due to temporal correlation \(\varepsilon _{t}\) in the model slightly improves the model fit and prediction accuracy. Finally, including the spatial dependence due to distance between regions \(\zeta _{i}\) in the model leads to slightly worse model fit and prediction accuracy.

Table 4 Deviance information criterion (DIC), Watanabe–Akaike information criterion (WAIC) and Bayesian leave-one-out cross-validation (BCV) for the considered models \(M_{0},M_{1},\ldots ,M_{4}\) fitted to the daily number of new COVID-19 cases in Spain, Italy and Germany

Table 5 presents the Bayesian estimates (posterior means) for every parameter of the full model \(M_{4}\) fitted to the daily number of new COVID-19 cases in Spain, Italy and Germany. The corresponding 95% credible intervals of the model parameters are also reported in parentheses.

Table 5 Posterior mean and 95% credible interval (in parentheses) for every parameter of the considered model fitted to the daily number of new COVID-19 cases in Spain, Italy and Germany

Comparing the estimated parameters among different countries, it can be seen that the dispersion parameter \(\varphi \) of the generalized Poisson distribution for Spain is lower than Italy and Germany, but its shape parameter \(\alpha \) is around 1.5 for the three countries, which implies that the variance of the daily counts in each region is approximately a quadratic function of their mean. The coefficient of the population density is not significantly different from zero for Spain and Italy, but it is positive for Germany which indicates that regions with higher population density have larger relative risks. The opposite signs of \(\psi _{1}\) and \(\psi _{2}\) indicate rough oscillations in \(\varepsilon _{t}\), which explains the small scale oscillatory temporal trends for the three countries, especially Germany.

The contribution of each random effect on the overall variation of relative risks can be quantified by

$$\begin{aligned} \gamma _{a} = \frac{1/\tau _{a}}{ 1/\tau _{\delta } + 1/\tau _{\xi } + 1/\tau _{\varepsilon } + 1/\tau _{\zeta }}, \quad a=\delta ,\xi ,\varepsilon ,\zeta . \end{aligned}$$

Using the joint posterior distribution of precision parameters \(\tau _{\delta },\tau _{\xi },\tau _{\varepsilon },\tau _{\zeta }\), estimates of the posterior means of \(\gamma _{\delta }\), \(\gamma _{\xi }\), \(\gamma _{\varepsilon }\) and \(\gamma _{\zeta }\) are obtained and summarized in Table 6. In line with model fit and prediction accuracy criteria in Tables 4 and 6 reveals that more than 98% of the variations in the relative risks are explained by the temporal trend \(\delta _{t}\). Also, the spatial dependence due to neighborhood relation between regions \(\xi _{i}\) and small scale temporal trends \(\varepsilon _{t}\) have minor contributions in the total variations of the relative risks for the three countries. The random effect term \(\zeta _{i}\) representing spatial dependence due to distance between regions has a negligible effect on the relative risks.

Table 6 Contribution (in percent) of each random effect term in the full model \(M_{4}\) on the overall variation of relative risks
Fig. 3
figure 3

Smoothed temporal trend of the relative risks of COVID-19, obtained from posterior mean and 95% credible interval of the structured temporal random effect of the fitted full model \(M_{4}\)

The Bayesian estimates and 95% credible intervals for the temporal trend \(\delta _{t}\), \(t=1,\ldots ,T\), are shown in Fig. 3. These plots can be interpreted as a smoothed temporal trend of the relative risk in the whole country. In fact, Fig. 3 suggests that the COVID-19 epidemic in all three countries rapidly reached their first peaks in the middle of March and started to decline at the beginning of April and then increased and reached its maximum in the middle of November. The second wave of the epidemic seems to be differently affecting each of the three countries; it declined in December for Spain and Italy, but seems to be persistent in Germany. In addition, a third peak at the beginning of January is also apparent for Spain, which may be related to end-of-the-year holidays.

Fig. 4
figure 4

Posterior mean of the spatial random effects \(\xi _{i}\) representing dependence due to neighborhood relation between regions (left) and \(\zeta _{i}\) representing dependence due to distance between regions (right) in the fitted full model \(M_{4}\)

Figure 4 shows the posterior means of the spatial random effects \(\zeta _{i}\) and \(\xi _{i}\), \(i=1,\ldots ,m\), on the corresponding map of each country. The plot illustrates spatial heterogeneity of the relative risk of COVID-19 across regions in each country. Once again, the negligibility of \(\zeta _{i}\) can be seen in Fig. 4. Regions with positive (negative) \(\xi _{i} + \zeta _{i}\) values are expected to have elevated (lower) relative risks than the baseline country-wide risk during the study period.

Fig. 5
figure 5

Observed value (solid line), predicted value (dashed line) and 95% prediction interval (grey area) for the daily number of new COVID-19 cases in the whole country, based on the posterior mean and 95% credible interval of the spatially accumulated relative risks of the fitted model

In order to see how the estimated relative risks under the fitted full model \(M_{4}\) are in agreement with the observed data, Fig. 5 shows the spatially accumulated daily number of cases \(\sum _{i=1}^{m}Y_{it}\), \(t=1,\ldots , T\), and their expected values under the fitted model \(M_{4}\), namely the posterior mean and 95% credible interval of \(\sum _{i=1}^{m}{\widehat{E}}_{it}{\widehat{\theta }}_{it}\), \(t=1,\ldots , T\). Except some discrepancies for Spain and Italy, the observed values are inside the 95% credible intervals and close to the expected values under the fitted model. Figure 5 in addition shows 4-days ahead forecasts of the total daily number of new cases at the end of study period of each country.

Finally, histograms of the normalized PIT values described in Sect. 4.4 are obtained using \(J=20\) from the fitted full models and plotted in Fig. 6. The normalized PIT values for the fitted models to data do not show a clear visible pattern and the histograms seem to be close to the standard uniform distribution.

Fig. 6
figure 6

Histograms of normalized PIT values obtained from the fitted full model \(M_{4}\) to check for uniformity

The above results and more details on observed and predicted values from the fitted full model are also provided in an interactive Shiny web application at https://ajalilian.shinyapps.io/shinyapp/.

6 Concluding remarks

There are some limitations in the analyses and modeling of data on the number of new cases of COVID-19, including data incompleteness and inaccuracy, unavailability or inaccuracy of relevant variables such as population movement and interaction, as well as the unknown nature of the new COVID-19 virus. Nevertheless, understanding the underlying spatial and temporal dynamics of the spread of COVID-19 can result in detecting regional or global trends and to further make informed and timely public health policies such as resource allocation.

In this study, we used a spatio-temporal model to explain the spatial and temporal variations of the relative risk of the disease in Spain, Italy and Germany. Despite data limitations and the complexity and uncertainty in the spread of COVID-19, the model was able to grasp the temporal and spatial trends in the data.

Obliviously, there are many relevant information and covariates that can be considered in our modeling framework and improve the model’s predictive capabilities. One good possibility would be considering most recent and accurate human mobility amongst regions and replace the naive distance matrix \({\mathbf {C}}\) in the covariance matrix of the random spatial effect \(\zeta _{i}\) with a matrix constructed based on such mobility data. We would expect our model would benefit from this information, which right now can not be accessed. Moreover, the considered spatio-temporal model in this paper is one instance among many possibilities. For example, one possibility is to include a random effect term in the model that represents variations due to joint spatio-temporal correlations; e.g., a separable sptaio-temporal covariance structure. However, the considered nested models were adequate and no joint spatio-temporal random effect term was considered to avoid increasing the model’s complexity.

We focused here on a stochastic spatio-temporal model as a good alternative to existing deterministic compartmental models in epidemiology to explain the spatio-temporal dynamics in the spread of COVID-19. However, it should be emphasized that one step forward would be considering a combination of a deterministic compartmental model in terms of differential equations for the number of susceptible, exposed, infectious and recovered cases with our sort of stochastic modeling approach. This is a clear novelty and a direction for future research.