1 Introduction

The coronavirus disease 2019 (COVID-19) pandemic has already claimed the lives of over 4.8 million people and more than 235 million confirmed cases worldwide (World Health Organization 2021). The global health and economic crisis caused by this new coronavirus has highlighted the major role that vaccination could play in controlling the incidence, hospitalization and lethality caused by this disease with more than 50 novel vaccine candidates already in clinical trials (Gallagher et al. 2021). Currently there are more than 6000 million vaccine doses administered worldwide (World Health Organization 2021). In Spain, there are already more than 70 million doses administered and almost 37 million people vaccinated with the required number of doses in a population of about 47 million people, so nearly 80% of the population is already fully vaccinated (Ministerio de Sanidad, Consumo y Bienestar Social 2021). Four vaccines approved by the European Medicines Agency (EMA) that have shown variable protection against severe acute respiratory syndrome-Coronavirus-2 (SARS-CoV-2) variants have been administered in Spain so far: the lipid nanoparticle (LNP)-formulated mRNA COVID-19 vaccines BNT162b2 (Pfizer/BioNTech) (Sahin et al. 2020), the mRNA-1273 (Moderna) (Jackson et al. 2020), the adenovirus (Ad)-based vaccines ChAdOx1 nCoV-19 (University of Oxford/AstraZeneca) (Mercado et al. 2020) and the Ad26.COV2.S (Johnson & Johnson/Janssen) (Arashkia et al. 2021). However, there are mixed opinions about vaccine effectiveness (VE) considering that this virus can be easily transmitted through aerosols in poor ventilated places (Klompas et al. 2020; Prather et al. 2020), and also because of the presence of asymptomatic carriers (Wilmes et al. 2021). The diverse mechanisms of action of the new vaccines administered are a subject of debate too (Leshem and Lopman 2021). Thus, vaccines can act with a first mechanism, also known as sterilising immunity, to block infection occurring entirely and, in this case, people cannot transmit the virus, or they can act with a mechanism consisting of stopping the progression of symptoms after infection occurs (Gallagher et al. 2021). However, in this second case, people can still transmit the infection to others, and the coronavirus can continue mutating and creating harmful new variants, which render this fight even more difficult. Currently, there are four variants of concern, the Alpha, Beta, Gamma and Delta variants, which were originally identified in the UK, South Africa, Brazil and India, respectively, as well as with an increasing number of other recently identified SARS-CoV-2 variants (Tao et al. 2021). Particularly, the rapid spread of the Delta variant is a great risk to global public health (Farinholt et al. 2021), as it is highly transmissible and contains mutations that confer partial immune escape (Riemersma et al. 2021). Indeed, it has been shown that vaccination reduces transmission of Delta, but by less than the Alpha variant (Eyre et al. 2021). In this context, it is not clear yet if the administered vaccines could provide protection against all coexisting variants and other variants expected to emerge in the near future. Furthermore, the durability of responses after COVID-19 is limited and new doses are required to keep the immunity strong against this pathogen (Widge et al. 2021). Thus, in this study, we have performed a spatio-temporal analysis for exploring the large-scale effect of vaccination on COVID-19 incidence and lethality in Spain. The main goal is to analyze if the vaccination levels have already shown an association with a change in either the incidence or the lethality rates.

2 Data

All the data used to conduct this study was downloaded from the public repository available in https://github.com/montera34/escovid19data. This repository is maintained by volunteers that collect data from multiple sources, including the Institute of Health Carlos III (ISCIII) from Spain. The data is provided at different levels of spatial aggregation and on a daily basis. In the present study, data is analyzed at a regional level according to the division of the country in Autonomous Communities, although only those located in the Iberian Peninsula have been considered. The study period spans from 4 January 2021 (first day for which the datasets provide vaccination values) to 30 September 2021. Thus, three spatio-temporal variables have been constructed for each region i and date t within the study period: COVID-19 daily incidence rate (\(I_{it}\)), COVID-19 daily lethality rate (\(L_{it}\)), and COVID-19 daily vaccination level (\(V_{it}\)). Specifically, \(I_{it}\), \(L_{it}\), and \(V_{it}\) have been computed as follows:

$$\begin{aligned}I_{it}&=100000\\&\quad \times \frac{\text {Number of new cases detected in region}\,i\,\text {on date }t}{\text {Population of region}\, i} \\L_{it}&=100\\&\quad \times \frac{\text {Cumulative number of deaths recorded in region}\, i\, \text {on date }t}{\text {Cumulative number of cases detected in region}\, i\, \text {on date }t} \\V_{it}&=100\\&\quad \times \frac{\text {Number of residents fully vaccinated in region}\, i\, \text {on date }t}{\text {Population of region}\, i} \end{aligned}$$

The definition considered for the lethality corresponds to the case fatality ratio (CFR), which is widely used for measuring the severity of a disease given its simplicity (Ghani et al. 2005), although other alternatives are also available (Kim et al. 2021). Incidence and lethality rates for the regions of Spain during the study period are shown in Fig. 1. Vaccination levels since the beginning of 2021 for the same set of regions are shown in Fig. 2. At the end of the study period they range from, approximately, 74% in Cataluña, to, approximately, 85% in Asturias. The fact that vaccine administration in Spain is carried out by regional governments leads to these moderate variations across regions.

Fig. 1
figure 1

Evolution of incidence (a) and lethality (b) rates for the regions of Spain considered for the analysis during the period January 2021–September 2021

Fig. 2
figure 2

Evolution of (complete) vaccination levels for the regions of Spain considered for the analysis during the period January 2021–September 2021

3 Methodology

3.1 Model description

A change point represents an abrupt variation in time series data, possibly as a consequence of a transition between states (Aminikhanghahi and Cook 2017). More precisely, if \(\{y_{t}\}_{t=1}^{n}\) is a time series, change point detection can be seen as a hypothesis test problem, considering a null hypothesis of the form “No change in the time series occurs”, and an alternative hypothesis that states that there exists a time, \(\tau \in \{1,\ldots ,n\}\), such as the statistical properties of \(\{y_{1},\ldots ,y_{\tau }\}\) and \(\{y_{\tau +1},\ldots ,y_{n}\}\) differ in some way (Killick and Eckley 2014).

The basis behind change point analysis can be extended to account for the effect of a temporally-varying covariate potentially associated with a variation in the values of a time series of interest. This leads to models with change points in the covariates, which can be adapted to multiple contexts, including survival analyses (Lee et al. 2020) and exposure-outcome epidemiological studies (Sarnaglia et al. 2021). In general, segmented regression models (Muggeo 2003) are the natural framework for dealing with this type of covariate effects.

In this study, a segmented spatio-temporal regression is employed for analyzing COVID-19 incidence and lethality rates at the regional level in Spain. Specifically, \(y_{it}\) denotes indistinctly the COVID-19 incidence or lethality in region i (\(i=1,\ldots ,15\)) on date t (\(t=1,\ldots ,192\)). The (complete) vaccination level in region i on date t, denoted by \(V_{it}\), is the main covariate considered for the analysis. Hence, the purpose of the model is to detect if the mean value of \(y_{it}\) depends on the condition \(V_{it}>c\) (or, equivalently, \(V_{it}\le c\)). In other words, we are interested in determining if there is an association between a certain vaccination level, c, and a change (in mean) in either the incidence or the lethality rates.

Thus, \(y_{it}\) is specified according to the following spatio-temporal structure

$$\begin{aligned} y_{it}=\alpha +\beta \cdot I(V_{it-\mathrm {lag}}>c)+u_{i}+v_{i}+\mathrm {ns}(t,\mathrm {df}) \end{aligned}$$
(1)

where \(I(V_{it-\mathrm {lag}}>c)\), considering a temporal lag of 0, 7, or 14 days, is an indicator function that is equal to 1 if the vaccination level exceeds some given threshold, c, and 0 otherwise. The use of a temporal lag is convenient in the context of epidemiological studies as there can exist a delayed association between the exposure variable (vaccination level) and the outcome (either incidence or lethality rate) under study (Bhaskaran et al. 2013). Parameter \(\alpha \) denotes the intercept of the model and \(\beta \) the effect that a vaccination level above c has on \(y_{it}\). For studying the impact of vaccination levels on both regional incidence and lethality rates, the value of c has been ranged from 10 to 80%, in intervals of 10 percentage points. The terms \(u_{i}\) and \(v_{i}\) represent, respectively, the structured and unstructured spatial random effect of the model, which are based on neighborhood relationships between regions (two regions have been considered as neighbors if they share a geographical boundary). Neighborhood relationships could account for mobility flows between regions, and partially explain the incidence rate of a region by considering the incidence rates of close regions. In the same way, it is also sensible to expect that close regions might have similar lethality rates because of certain unobserved characteristics in common. In particular, it could happen that some viral variant (that presents higher lethality rate) is more abundant in a zone of the country. This is something that we are not able to account for explicitly, at least at the moment, which could be somehow reflected by spatial random effects. Specifically, the Besag–York–Molliè (BYM) model has been considered (Besag et al. 1991), which establishes, under a Bayesian framework, that the conditional distribution of the spatially-structured effect on area i, \(u_{i}\), is

$$\begin{aligned} u_{i}|u_{j \ne i} \sim Normal\bigg (\sum _{j \ne i=1}^{n} w_{ij}u_{j},\frac{\sigma ^2_{u}}{N_{i}}\bigg ) \end{aligned}$$

where \(N_{i}\) is the number of neighbors for area i, \(w_{ij}\) is the (i,j) element of the row-normalized neighborhood matrix (\(w_{ij}=1/N_i\) if regions i and j are neighbors, and 0 otherwise), and \(\sigma ^2_{u}\) represents the variance of the random effect. The spatially-unstructured effect, denoted by \(v_{i}\), follows a Gaussian distribution, \(v_{i} \sim Normal(0,\sigma ^2_{v})\), where \(\sigma ^2_{v}\) is the variance of this effect. Finally, \(\mathrm {ns}(t,\mathrm {df})\) denotes a natural cubic spline on the temporal component with \(\mathrm {df}\) degrees of freedom (Hastie 2017). Natural splines are suitable for capturing trends and seasonal patterns in the data (Bhaskaran et al. 2013).

A modification of the model presented by Eq. 1 was also tested by allowing the slope parameter (\(\beta \)) to vary regionally

$$\begin{aligned} y_{it}=\alpha +(\beta +\gamma _{i}+\delta _{i}) I(V_{it}>c)+u_{i}+v_{i}+\mathrm {ns}(t,\mathrm {df}) \end{aligned}$$
(2)

where \(\gamma _{i}\) and \(\delta _{i}\) follow the same structure of the BYM model described above for \(u_i\) and \(v_i\), enabling us to account for the possibility that the effect of a vaccination level above c has differed across regions. As can be observed, we have not indicated any covariate effect other than the one based on the segmentation of vaccination levels, as this is the main purpose of the study. A term of the form \(\sum _k \eta _k x_{ki}\), where \(x_k\) represents a covariate and \(\eta _k\) the parameter that measures the impact of \(x_k\) on \(y_{it}\), can be added to either Eqs. 1 or 2 to account for regional covariates (constant over time throughout the study period) that could be helpful for explaining overall incidence or lethality rates.

All the models have been fitted with the INLA R package (Lindgren and Rue 2015), which makes use of the Integrated Nested Laplace Approximation proposed by Rue et al. (2009). Non-informative priors for the fixed-effects parameters and the default Gamma-distributed priors for the precision of the random effects were employed.

3.2 Software

The R programming language (R Core Team 2020a) has been used in our analysis. In particular, the R packages ggplot2 (Wickham 2016), rgdal (Bivand et al. 2019), rgeos (Bivand and Rundel 2020), spdep (Bivand et al. 2008), and splines (R Core Team 2020b) have also been used.

4 Results

Figure 3 summarizes the results corresponding to fitting the model with change points in the vaccination levels represented by Eq. 1 (natural cubic splines with 16 and 10 degrees of freedom were finally chosen for modeling incidence and lethality rates, respectively, as they yielded the best results according to the DIC proposed by Spiegelhalter et al. (2002)). Specifically, the estimates of parameter \(\beta \) are shown considering the incidence (Fig. 3a) and the lethality rates (Fig. 3b) as the response variable of the model. The 95% credibility intervals corresponding to each estimate are also displayed. As mentioned earlier, temporal lags of 0, 7, and 14 days are tested to account for delayed exposure-outcome associations. In the case of the incidence rates, most of the \(\beta \) estimates are not different from 0 with 95% credibility, except for the estimates corresponding to \(c=30\%\) (\(\mathrm {lag}=7\)), \(c=50\%\) (\(\mathrm {lag}=0\)), and \(c=60\%\) (\(\mathrm {lag}=0\) and \(\mathrm {lag}=7\)). The first of these associations must be a consequence of a wave of COVID-19 that Spain suffered at mid-July 2021, when vaccination levels were around 50%. Despite these results, the associations between vaccination levels and incidence rates for the study period, considering both the modeling approach chosen and the scale of the geographical units under analysis, do not show a clear trend. Indeed, the strong association for \(c=50\%\) with \(\mathrm {lag}=0\) disappears when a temporal lag of 7 or 14 days is considered, even though the impact of varying the temporal lag is generally minor. Regarding these models on incidence rates, it is worth noting that an estimation of the susceptible population (by subtracting the number of recorded cases to the population size) was also tested as a denominator in the definition of \(I_{it}\) with the aim of increasing its reliability, but no remarkable differences were observed in the results. Moreover, we should not forget that incidence levels might be inaccurate sometimes as a consequence of the presence of asymptomatic carriers, which could potentially distort the results.

In contrast, the estimates of the \(\beta \) parameter for the models describing lethality rates suggest that lethality rates are decreasing progressively. Particularly, when vaccination reached the 50% of the population, the estimates of \(\beta \) are negative with 95% credibility, which indicates a change point in the time series of lethality rates at this stage of the vaccination process. This should not be interpreted as a cause-effect relationship, but points out at which percentage of vaccination level the lethality rates of the regions of Spain might have started to display some reduction.

Furthermore, Fig. 4 shows the estimates of the spatial effects (\(u_i+v_i\)) on incidence (Fig. 4a) and lethality rates (Fig. 4b) only for \(c=60\%\), as DIC is minimized for this value of c. Nevertheless, the patterns are almost identical for other values of c. Hence, it can be appreciated that there exists a marked spatial variation in terms of both incidence and lethality rates. In particular, some of the more densely populated regions of Spain have experienced the highest incidence risk (Madrid, Cataluña, and the Comunidad Valenciana). However, lethality risk has been higher in some regions such as Asturias and Castilla y León, partially as a consequence of the presence of more population in the older age ranges. Indeed, including the proportion (in %) of the population of the region aged 65 years and over as another covariate of the model yields a parameter estimate \({\hat{\eta }}_{eld\_pop}=0.15\) (with 95% credible interval [0.01, 0.29]), considering \(c=60\%\), which indicates that there is a positive association, at the region level, between the lethality rate and the proportion of elderly population. Similarly, a positive association has been also found between population density levels and incidence rates, with a parameter estimate \({\hat{\eta }}_{pop\_dens}=1.24\times 10^{-2}\) and a 95% credible interval \([2.16\times 10^{-3}, 2.28\times 10^{-2}]\) (again, these results correspond to \(c=60\%\), but similar parameter estimates are also obtained for other values of c). Although the consideration of these covariate effects barely alters the DIC of the models, Fig. 5 allows us to visualize these covariate associations, considering monthly averages of either the incidence (Fig. 5a) or the lethality rates (Fig. 5b). In particular, the correlation between population density levels and incidence rates seems to be highly variable over time, an issue which might deserve further study. To end with the analysis of spatial effects estimates, it is worth mentioning that the computation of the proportion of variance explained by each spatial effect (Blangiardo and Cameletti 2015) has revealed that the unstructured component, \(v_i\), captures around 99% and 95% of the spatial variability of the incidence and lethality rates, respectively, at the regional level studied. This suggests that spatial proximity between regions is not sufficient to explain the differences in incidence/lethality observed during the last months of the COVID-19 pandemic.

In addition, Fig. 6 summarizes the results that follow from fitting Eq. 2 to allow for regionally-varying covariate effects. Regarding this modeling approach, we have only focused on lethality rates, as no clear association has been found for incidence rates with the model represented by Eq. 1. Figure 6a shows the estimates of \(\gamma _i+\delta _i\) for each of the regions under analysis, considering \(c=60\%\) and \(\mathrm {lag=7}\). There are marked differences, as for six of the regions the estimates are different from 0 with 95% credibility. Figure 6b displays the evolution of the lethality rates for these regions, indicating the date at which vaccination levels reached 60%. It can be observed than lethality rates dropped remarkably in Asturias, Castilla y León, and Cataluña at the time this level of vaccination was reached. Again, it must be noted that cause-effects relationships cannot be derived from this analysis, but only exposure-outcome associations.

Fig. 3
figure 3

Estimated effect (with the 95% credibility interval) of a certain vaccination level on incidence (a) and lethality (b) rates. These estimates correspond to \(\beta \) in Eq. 1

Fig. 4
figure 4

Estimated spatial effects at the region level for incidence (a) and lethality (b) rates, considering \(\mathrm {lag}=7\). These estimates correspond to \(u_i+v_i\) in Eq. 1

Fig. 5
figure 5

Scatter plots of population density versus incidence rate (a), and proportion of population aged 65 and over versus lethality rate (b), considering monthly averages for some months within the study period

Fig. 6
figure 6

Estimated region-specific random slopes (with the 95% credibility intervals) a considering the modeling of lethality rates (for \(c=60\%\) and \(\mathrm {lag}=7\)). These estimates correspond to \(\gamma _i+\delta _i\) in Eq. 2. In b, the evolution of lethality rates for six specific regions is shown. Each dashed vertical line represents the date at which vaccination level reached 60% for the corresponding region

5 Discussion and conclusions

According to a study performed in Israel, COVID-19 vaccination is highly effective in preventing symptomatic and asymptomatic SARS-CoV-2 infections and COVID-19-related hospitalisations, and deaths (Haas et al. 2021). The studies of VE performed in Spain, where around 80% of the population has already been vaccinated (at the time of this writing), have provided encouraging results so far. Thus, even though the vaccination campaign in Spain was carried out at a much slower speed than Israel, COVID-19 vaccination using mRNA vaccines in Spain has shown to be very effective in preventing infections, and COVID-19 hospitalisations and deaths in elderly long-term care facilities (LTCF) residents (Mazagatos et al. 2021). Moreover, these results showed a similar level of protection against asymptomatic and symptomatic infections among fully vaccinated LTCF residents. Another study performed in only one region of Spain, Navarra, showed that COVID-19 VE was moderate in preventing SARS-CoV-2 infection and was higher against symptomatic and hospitalised cases (Martínez-Baz et al. 2021). Therefore, these studies have already shown that vaccination has reduced hospitalization and lethality in Spain. However, the COVID-19 VE in reducing the COVID-19 transmission is still unclear, even though the study performed in Israel showed an association between COVID-19 incidence and vaccination (Haas et al. 2021). Therefore, preventive measures such as social distance, masks wearing, hand washing, etc., which are very efficient in reducing the viral transmission (Wilder-Smith and Freedman 2020) should be kept while SARS-CoV-2 is still circulating among us.

In the present study, COVID-19 incidence and lethality rates have been studied at a regional analysis through segmented spatio-temporal models. This kind of macroscopic analysis does not allow quantifying VE, but enables to capture general trends in space and time. Particularly, the models fitted are able to detect change points in the vaccination levels that are reflected in the time series of either incidence or lethality rates. We have observed that increasing levels of vaccination display an association with reduced lethality rates. This has not been observed for incidence rates. Besides, the spatial component of the model has enabled us to determine incidence and lethality risks for the regions chosen for the analysis, whereas the consideration of a regionally-varying slope in the model has proven to be useful for capturing the differential behavior of some of the regions.

Finally, it is worth noting, again, that this type of ecological analysis has clear limitations, namely that it does not allow quantifying VE, but only to perform an exploratory (spatio-temporal) analysis of large-scale indicators. Nevertheless, these models could be helpful as an additional pandemic monitoring tool, especially if more covariates are added, or even if the possibility that the segmented covariate (in this case, the vaccination level) presents several change points is also incorporated. Regarding the inclusion of more covariates, the consideration of binary variables that reflect how the restrictions and prevention measures have changed over time seems advisable to avoid confounding effects. Considering the different types of administered COVID-19 vaccines in the analysis could be also of great interest for future research in this area.