1 Introduction

Air pollution, composed of particulate matter (\({\text{PM}}\)) and gaseous pollutants, has a substantial negative impact on the environment, ecosystem and human health. Poor air quality is one of the five most significant health risks worldwide, alongside high blood pressure, smoking, diabetes and obesity (Cohen et al. 2017; Daellenbach et al. 2020). It becomes one of the most considerable health concerns for the residents in areas of higher population density (Dias and Tchepel 2018), centres with dense activities, and to particular user groups (Agarwal and Kaddoura 2019; Singh et al. 2021). Among all the pollutants, the particulate matter, with an aerodynamic diameter less than or equal to 10 and 2.5 μm respectively (\({\text{PM}}_{10}\) and \({\text{PM}}_{2.5}\)) are most consistently connected with numerous adverse health outcomes including lung infections, cardiovascular diseases, and respiratory problems (Joseph et al. 2003; Martuzzi et al. 2006; Samoli et al. 2013), while appropriate regulation directly reduces the adverse health effects, increases general well-being, and improves public health (Steinle et al. 2015).

In Europe, although the European Environment Agency (EEA) maintains a rather dense PM monitoring network to record the concentration levels across countries, huge regions of the European continent remain unmonitored. For proper assessment of population-wide exposure and appropriate formulation of pollution mitigation strategies, the responsible authorities need accurately estimate and predict the concentration levels at the unobserved locations (Chu et al. 2015).

The main challenge to forecasting PM concentrations corresponds to the complexity of PM generation and spreading dynamics. On the one hand, the PM generation is dominated by two complicated sources, inorganic aerosol coming from the agriculture, long-range transport and energy sectors (Steinle et al. 2013; Daellenbach et al. 2020), as well as organic aerosol coming from biomass and fossil fuels burning emissions, vehicles emissions and cooking (Lenschow et al. 2001; Omidvarborna et al. 2015). On the other hand, the PM spread depends on both meteorological conditions and land use dispersion, leading the observed concentration levels to fluctuate geographically and temporally.

In the statistical literature, the Bayesian spatio-temporal model that allows modelling a complex environmental phenomenon through a hierarchy of sub-models becomes one of the most promising methodologies in air quality scientific investigations (Cameletti et al. 2013; Amin et al. 2015; Taheri Shahraiyni and Sodoudi 2016; Forlani et al. 2020; Fioravanti et al. 2021; Castro-Camilo et al. 2021). In particular, this approach allows involving the explanatory variables to explain the large-scale variability, take residual dependency into account through a space-time process with a Gaussian random field (GRF), and produce high-resolution spatial forecasts to meet the rising demand for predictive concentrations maps in epidemiological studies.

According to Porcu et al. (2012), the main drawback of the Bayesian model with a GRF refers to the computational difficulty of dealing with enormous amounts of data, especially when applying complex spatial dependence measures (i.e., the Matérn covariance function). Some strategies have been proposed to alleviate the computational burden of fitting complex spatial and temporal models. Lindgren et al. (2011) proposed the stochastic partial differential equation (SPDE) approach, providing a method to represent a continuous Matérn field through a discretely indexed Gaussian Markov random field (GMRF) associated with a sparse precision matrix, which enjoys good computational property. Rue et al. (2009) also provided the integrated nested Laplace approximation (INLA) algorithm that performs direct numerical calculations on the marginal posterior distributions, avoiding the time-consuming Markov chain Monte Carlo (MCMC) simulations. Additionally, GMRF with SPDE approach can be fitted in a Bayesian hierarchical framework through the INLA approach, with implementation in the R-INLA package available at https://www.r-inla.org/, making this methodology fast and easily implemented.

Most previous spatial and temporal studies on air pollution only concentrated on moderate (i.e., daily, monthly and annual mean) PM concentrations (Cameletti et al. 2013; Beloconi et al. 2018; Fioravanti et al. 2021; Saez and Barceló 2022). However, extreme conditions are actually more concerned with environmental quality management due to their various hazardous impacts (Amin et al. 2015). Numerous epidemiological studies pointed out that short-term exposures to severe PM pollution can trigger serious acute cardiovascular and respiratory mortality (Orellano et al. 2020; Lei et al. 2019; Zhang et al. 2019; Yu et al. 2014; Brook et al. 2016) and huge economic loss in the corresponding hospitalization (Xie et al. 2021; Shah et al. 2013). In the field of extreme case spatio-temporal analysis, Sharma et al. (2012), Rodríguez et al. (2016), Amin et al. (2015), Martins et al. (2017) and Castro-Camilo et al. (2021) typically focused on a small spatial domain, making it difficult to consider the complicated orography with a variety of climatic conditions, as well as to provide general suggestions to national governments on the environmental policy formulation and health care allocation. More importantly, to our best knowledge, no studies consider the potential differences between moderate and extreme air pollution, in other words, model different scaled air pollution simultaneously to identify similarities and differences in the effects of influential factors.

In this paper, we focus on the spatial and temporal variation of both moderate and extreme air pollution (i.e., annual mean and annual maxima of daily \({\text{PM}}_{10}\) concentration levels) in mainland Spain from 2017 to 2021, after controlling for meteorological variables and socio-economic factors. The contribution is two-fold: the predictions of extreme pollution (excursion functions) and the investigation of similar/reverse effects of predictors in different scaled cases. First, we establish several Bayesian hierarchical generalized extreme models on annual maxima and select the best model based on their predictive performance, detailed in Sects. 3.1 and 4.1. Secondly, we utilize the joint Bayesian model with sharing effects in Sects. 3.2 and 4.3 to model both annual mean and maxima concentrations simultaneously. We observe the comparable influence of precipitation, vapour pressure, and population density, as well as the possible opposite effects of altitude and temperature. We also generate excursion function maps (Bolin and Lindgren 2015) based on the joint model to highlight the regional risk ranking that simultaneously exceeds the warning risk threshold in Sect. 4.4. These main findings, comprehensive knowledge on the \({\text{PM}}_{10}\) generation and spread with high-resolution spatial forecasts are expected to promote awareness of the significance of extreme air pollution research, help in the investigation of the long-term effect in epidemiological studies, and underpins air pollutants regulation and human health protection strategies for environmental agencies.

The rest of the paper is organized as follows. In Sect. 2, we demonstrate the dataset for response and main explanatory variables. In Sect. 3, we formalize spatio-temporal Bayesian generalized extreme models on moderate-extreme air pollution in the framework of Bayesian spatial analysis and extreme value theory. We present the main results, applications and potential influence in Sect. 4, and conclude this paper with extensional discussions in Sect. 5.

2 Data

2.1 PM\(_{10}\) concentrations and spatial domain

In mainland Spain, the \({\text{PM}}_{10}\) concentration levels data is accessible by the EEA’s air quality database, Air Quality e-Reporting, consisting of a multi-annual time series data of air quality measurement and calculated statistics for a number of air pollutants. To work with a more robust dataset, we only retain 342 air pollution stations that have at least 60% valid observations in a year, with the geographical distribution of the stations shown in red circles embedded in the mesh constructed for the SPDE approach (Fig. 1). This five-year dataset (2017–2021) with 1470 observations is divided into the training set (2017–2020; 1215 observations) and the validation set (2021; 255 observations), which are used to evaluate and compare the model fitness and predictive ability.

Fig. 1
figure 1

Study domain together with the spatial distribution of the 342 monitoring sites in red circles. The figure also illustrates the mesh used to build the SPDE approximation to the continuous Matérn field

Figure 2 shows the temporal and spatial variations of extreme \({\text{PM}}_{10}\) following the EEA’s air quality categories with 0–20 μg/m3 (Good), 20–40 μg/m3 (Fair), 40–50 μg/m3 (Moderate), 50–100 μg/m3 (Poor), 100–150 μg/m3 (Very poor), more than 150 μg/m3 (Extremely poor). Temporally, severe pollution seems to occur in 2017 and 2020, as indicated by the numerous monitors coloured in red (very poor) and purple (extremely poor). Spatially, compared with relatively low annual maxima recorded in the east (Valencian Community) and north (Basque Country), high \({\text{PM}}_{10}\) concentrations are most prevalent in the centre, northwest and southeast, which correspond to the autonomous communities of Madrid, Galicia, Andalusia, and the Region of Murcia, respectively. This spatio-temporal pattern inspires our further investigation of spatio-temporal modelling taking appropriate topography covariates into account, which are stated in the following section.

Fig. 2
figure 2

Spatio-temporal patterns of annual maxima \({\text{PM}}_{10}\) concentration levels in years 2017–2021 throughout mainland Spain. The annual data is reported by European Environment Agency (EEA) and shown in the heat map with EEA’s air quality category

2.2 Explanatory variables

A number of potential predictors are available based on prior findings in the air quality literature (Cameletti et al. 2013; Fioravanti et al. 2021; Castro-Camilo et al. 2021), and we choose to include a set of five main spatial and spatio-temporal varying predictors with the complete description list reported in Table 1.

Table 1 Description for explanatory variables

In the following, we describe the selected predictors in detail.

Meteorological variables. The meteorological variables (temperature, precipitation and vapour pressure) of the monthly mean are collected from the CRU TS (Climatic Research Unit gridded Time Series; Harris et al. 2020) dataset and aggregated to be annual mean. Accordingly, CRU TS was first published in 2000, using ADW (angular-distance weighting) to interpolate anomalies of monthly observations onto a 0.5° grid over land surfaces (excluding Antarctica) for observed and derived variables (mean, minimum and maximum temperatures, precipitation, vapour pressure, wet days and cloud cover) with no missing values in the defined domain.

Elevation. The altitude data, height over sea level, matched with locations of all air pollution monitors, is accessible in the annual aggregated air quality values dataset provided by EEA, available at https://discomap.eea.europa.eu/App/AirQualityStatistics/index.html.

Population density. The population densities are calculated in each autonomous community. The original data is collected from the statistics report (available at https://stats.oecd.org/) of the Organisation for Economic Co-operation and Development (OECD). The OECD statistics contain data and metadata for economic and education indexes of OECD countries and some selected non-member economies.

We see in Table 2 that the correlations of potential predictors to extreme and average PM\(_{10}\) concentrations display a similar or different direction, which also vary year by year. This will be further investigated by our spatio-temporal generalized extreme model and joint model with sharing effects, see details in both Sects. 3.1 and 3.2. Furthermore, considering the correlation between location variables and meteorological variables (e.g., latitude and temperature), we adjust the location variables (longitude and latitude) as covariates to investigate the impact of meteorological factors.

Table 2 Correlation between explanatory variables and annual mean and annual maximum PM\(_{10}\) concentrations on log scale

3 Model formulation

In Sect. 3.1, we first provide a brief introduction to spatio-temporal Gaussian field and extreme value models, followed by four candidate models for the annual maximum of daily PM\(_{10}\) concentrations in mainland Spain. Subsequently, in Sect. 3.2, we present the joint Bayesian Gumbel-Gaussian model for both extreme and moderate levels.

3.1 Spatio-temporal extreme value models with mixed effects

The generalized extreme value (GEV) distribution is widely employed for modelling extremes in environmental science (Reiss and Thomas 2007), such as temperature (Cheng et al. 2014), precipitation (Panagoulia et al. 2014), air pollution (Deng and Zhang 2018; Martins et al. 2017) and sea level (Lobeto et al. 2018). The GEV distribution has three parameters, location parameter (\(\mu\), \(-\infty<\mu <\infty\)), scale parameter (\(\sigma\), \(\sigma >0\)) and tail parameter (\(\xi\), \(-\infty<\xi <\infty\)) with the cumulative distribution function

$$\begin{aligned} \text{ GEV }(x; \mu ,\sigma , \xi )=\exp \left\{ -\left[ 1+\xi \left( \frac{x-\mu }{\sigma }\right) \right] _{+}^{-\frac{1}{\xi }}\right\} . \end{aligned}$$

The case \(\xi =0\) is interpreted as the limit case of \(\xi \rightarrow 0\), leading to the Gumbel family with distribution function

$$\begin{aligned} \text{ Gumbel }(x; \mu , \sigma )=\exp \left\{ -\exp \left[ -\left( \frac{x-\mu }{\sigma }\right) \right] \right\} , \quad x \in \mathbb {R}. \end{aligned}$$

Considering the potential spatial and temporal dependence among the PM\(_{10}\) concentrations, we use two typical approaches for measurement: Matérn covariance function (Matérn 1986; Guttorp and Gneiting 2006) with the stochastic partial differential equations (SPDE; Lindgren et al. 2011) approximation for spatial correlation and the auto-regressive dynamic model (AR; Shumway and Stoffer 2017) for temporal dependence. We establish two groups of generalised extreme value models with similar fixed effects and varying random effects to model the annual maxima \({\text{PM}}_{10}\) concentrations.

3.1.1 Model 1: Gumbel model with fixed effect and spatio-temporal random effect

Let \(y_{\text {max}}({\varvec{s}},t)\) denote the logarithm transform of annual maxima \({\text{PM}}_{10}\) concentrations at location \({\varvec{s}} \in \mathcal {S}\) and year \(t \in \mathcal {T}\), where \(\mathcal {S}\) is the study area and \(\mathcal {T}\) is the time period in focus. Under the assumption of constant scale (\(\sigma\)) and tail (\(\xi\)) parameters, we use a linear combination of fixed effects with explanatory variables and spatio-temporal varying random effect to model the location parameter (\(\mu ({\varvec{s}},t)\)) in the Gumbel model below. Suppose that

$$\begin{aligned} \begin{aligned} \left[ y_{\text {max}}({\varvec{s}},t) \mid \mu ({\varvec{s}},t), \sigma \right]&\sim {\text {Gumbel}}\left( \mu ({\varvec{s}},t), \sigma \right) \\ \text{ with } \quad \mu ({\varvec{s}},t)&= {\textbf{x}}({\varvec{s}},t)^\top \varvec{\beta }+u({\varvec{s}},t) \end{aligned} \end{aligned}$$
(1)

and

$$\begin{aligned} \begin{aligned} u({\varvec{s}},t)&= a u({\varvec{s}},t-1 ) +w\left( {\varvec{s}}, t\right) , \\ w\left( {\varvec{s}}, t\right)&\sim \mathcal {G} \mathcal {P}_{2 \textrm{D}-{\text {SPDE}}}\left( \rho _M, \sigma _M, \nu _M\right) . \end{aligned} \end{aligned}$$
(2)

Here, the vector \({\textbf{x}}(s,t)\) contains an intercept and the explanatory variables of location variables, meteorological variables and human-effect variables listed in Table 1, and the vector \(\varvec{\beta }\) corresponds to the regression coefficients associated with the fixed effects. The term \(u({\varvec{s}},t)\) represents a spatio-temporal varying random effect that incorporates spatio-temporal interaction (Cameletti et al. 2013). It temporally changes according to AR(1) dynamics with autocorrelation parameter a and spatial correlated and serially independent innovations \(w\left( {\varvec{s}}, t\right)\).

Given two locations \(s_i\) and \(s_j\) separated by \(h=d(s_i,s_j)\) (normally Euclidean) units, the Gaussian process with mean 0 and Matérn covariance function is in the form of

$$\begin{aligned} {\text {Cov}}(w(s_i,t),w(s_j,t'))= {\left\{ \begin{array}{ll}0, &{} t \ne t', \\ {\frac{\sigma ^2}{2^{\nu-1} \ \ \Gamma (\nu )}\left( \sqrt{8 \nu } \frac{h}{\rho }\right) ^{\nu } K_{\nu }\left( \sqrt{8 \nu } \frac{h}{\rho }\right) }, &{} t=t',\end{array}\right. } \end{aligned}$$

where \(\Gamma\) is the gamma function, \(K_\nu\) is the modified Bessel function of the second kind, \(\rho >0\) is the range parameter, \(\nu >0\) is the smoothness parameter, and \(\sigma ^2>0\) is the marginal variance.

3.1.2 Model 2: Gumbel model with fixed effect and separated spatial/temporal/spatio-temporal random effects

Our second model is a modification of Model 1 specified in Eq. (1) with spatial and temporal random effects and separable interaction effects specified in Eq. (2), namely, we keep the Gumbel distribution assumption on \(y_{\text {max}}({\varvec{s}},t)\) as below.

$$\begin{aligned} \begin{aligned} \left[ y_{\text {max}}({\varvec{s}},t) \mid \mu ({\varvec{s}},t), \sigma \right]&\sim {\text {Gumbel}}\left( \mu ({\varvec{s}},t), \sigma \right) ,\\ \mu ({\varvec{s}},t)&= {\textbf{x}}({\varvec{s}},t)^\top \varvec{\beta }+f(t)+w({\varvec{s}}) + {u({\varvec{s}},t)},\\ \text{ with }\ \ f(t)&\sim \mathcal {G} \mathcal {P}_{\textrm{AR} (1)}\left( a, \tau _{AR}\right) ,\\ w\left( {\varvec{s}}\right)&\sim \mathcal {G} \mathcal {P}_{2 \textrm{D}-{\text {SPDE}}}\left( \rho _M, \sigma _M, \nu _M\right) . \end{aligned} \end{aligned}$$
(3)

Note that \(u({\varvec{s}},t)\) is defined the same as in Eq. (2), indicating the spatio-temporal interaction term with the Kronecker product, see e.g., Cameletti et al. (2011, 2013) and Fioravanti et al. (2021). The f(t) denotes the non-linear random effect in the temporal structure of AR(1), the \(w({\varvec{s}}, t)\) is the spatially dependent only random effect with SPDE structure. Specifically, the implementation of the AR(1) model in INLA generally assumes the Gaussian white noise with mean 0 and precision \(\tau _{AR}\). For f(t) defined over the naturally binned covariate (Year), let \({\varvec{t}}=\left( t_1, \ldots , t_5\right) ^\top\) denotes the time from the first year (\(t_1\)) to the last year (\(t_5\)),

$$\begin{aligned} \begin{aligned} f(t_1)&\sim \mathcal {N}\left( 0,\left( \tau _{AR}\left( 1-a^2\right) \right) ^{-1}\right) , \\ f(t_i)&=a f(t_{i-1})+\epsilon _i, \quad \epsilon _i \sim \mathcal {N}\left( 0, \tau _{AR}^{-1}\right) , \quad i=2, \ldots , 5, \end{aligned} \end{aligned}$$

where \(-1<a<1\) is a numeric constant, the so-called autocorrelation, by which we multiply the lagged variable \(f(t_{i-1})\), and \(\epsilon _i\) denotes the unpredictable error in the form of Gaussian white noise.

3.1.3 Model 3: GEV model with fixed effect and spatio-temporal random effect

The generalised extreme value (GEV) models are basically following the same structure as Gumbel models except the generalized extreme value distribution for the response. To be specific, we suppose that

$$\begin{aligned} \left[ y_{\text {max}}({\varvec{s}},t) \mid \mu ({\varvec{s}},t)\right] \sim {\text {GEV}}\left( \mu ({\varvec{s}},t), \sigma , \xi \right) , \end{aligned}$$
(4)

where the location parameter \(\mu ({\varvec{s}},t)\) is of the same form of mixed effects as in Eq. (1) and random effects in Eq. (2).

3.1.4 Model 4: GEV model with fixed effect and separated spatial/temporal random/spatio-temporal random effect

Similar consideration of Model 2 modified from Model 1, we consider the following model in parallel with Model 3, i.e., we take spatial and temporal random effects with an interaction effect into the GEV model. Suppose that

$$\begin{aligned} \begin{aligned} \left[ y_{\text {max}}({\varvec{s}},t) \mid \mu ({\varvec{s}},t), \sigma , \xi \right]&\sim {\text {GEV}}\left( \mu ({\varvec{s}},t), \sigma , \xi \right) \end{aligned} \end{aligned}$$
(5)

with the location parameter \(\mu ({\varvec{s}},t)\) is of spatio-temporal structure of the form in Eq. (3).

It is worth pointing out that the well-known limit type theorem in Coles (2001) motivates our GEV distribution assumption of annual maxima of PM\({_{10}}\) concentrations, and its reduced case (i.e., Gumbel corresponds to GEV with \(\xi =0\)). The latter one becomes more parsimonious, and its exponential decay tail might be appropriate to fit extreme air quality (Deng and Zhang 2018). The Gumbel model is generally needed when the uncertainty of \(\xi\) being non-zero arises in the GEV model with an estimate of \(\xi\) close to zero.

3.2 Bayesian joint model with sharing effects

In order to identify the potential varied effect levels of main explanatory variables, with inspiration from applications of the joint model with sharing effects on wildfire (Koh et al. 2023), we model both moderate and extreme PM\(_{10}\) pollution simultaneously in two respective sub-models linked by the sharing effects and the sharing coefficients (scaling factors).

Let \(y_{\text {mean}}({\varvec{s}},t)\) denote the logarithm transform of annual mean \({\text{PM}}_{10}\) at location \({\varvec{s}} \in \mathcal {S}\) and year \(t \in \mathcal {T}\). We perform the Gaussian sub-model and Gumbel sub-model on annual mean and annual maxima simultaneously, with the structure of the best extreme value model as Model 1 according to the model fitness and prediction analysed in Sect. 4.1.

$$\begin{aligned} \begin{aligned} {\left[ y_{\text {mean}}({\varvec{s}},t) \mid \mu _{\text {mean}}({\varvec{s}},t), \sigma _{\text {mean}}^2 \right] }&\sim {\text {Gaussian}}\left( \mu _{\text {mean}}({\varvec{s}},t), \sigma _{\text {mean}}^2 \right) \\ \text{ with }\qquad \mu _{\text {mean}}({\varvec{s}},t)&= {{\textbf{x}}^{S}({\varvec{s}},t)^\top \varvec{\beta }^{S}}+{{\textbf{x}}^{N\!S}({\varvec{s}},t)^\top \varvec{\beta }_{\text {mean}}^{N\!S}} +{u^{S}({\varvec{s}},t)}; \\ {\left[ y_{\text {max}}({\varvec{s}},t) \mid \mu _{\text {max}}({\varvec{s}},t), \sigma _{\text {max}}\right] }&\sim {\text {Gumbel}}\left( \mu _{\text {max}}({\varvec{s}},t), \sigma _{\text {max}}\right) \\ \text{ with } \qquad \mu _{\text {max}}({\varvec{s}},t)&=\ {{\textbf{x}}^{S}({\varvec{s}},t)^\top \varvec{\beta }^{S}}{\varvec{\beta }_{1}^{\text {Gaussian-Gumbel}}}\\&\quad +{{\textbf{x}}^{N\!S}({\varvec{s}},t)^\top \varvec{\beta }_{\text {max}}^{N\!S}} +{{\beta }_{2}^{\text {Gaussian-Gumbel}}}{u^{S}({\varvec{s}},t)}, \qquad \end{aligned} \end{aligned}$$
(6)

where the terms \(\varvec{\beta }^{S}\) and \(u^{S}({\varvec{s}},t)\) with superscript S denote the sharing effects, and \({\textbf{x}}^{S}({\varvec{s}},t)\) are corresponding variables. The term \(\varvec{\beta }^{N\!S}\) with superscript NS denotes the non-sharing effects with corresponding covariates \({\textbf{x}}^{N\!S}({\varvec{s}},t)\) which includes the intercept and the other variables. For the selection of sharing and non-sharing variables, to avoid the potential issue of uncertainty of significant sharing effects (\({\beta }^{S}=0\)), we treat two significant predictors (altitude and precipitation) as sharing terms to investigate the potential similar and reverse effects by the ratios (\(\varvec{\beta }_1^{\text {Gaussian-Gumbel}}\)) while considering all other variables, including three non-significant ones (temperature, vapour pressure, and population density), and location variables as non-sharing terms.

Additionally, the spatio-temporal random effect is also treated as sharing effects with respect to the simplicity of computation. The sharing coefficients \(\varvec{\beta }_1^{\text {Gaussian-Gumbel}}\) and \({\beta }_2^{\text {Gaussian-Gumbel}}\) scale the common components of predictor vector \({\varvec{x}}^S({\varvec{s}}, t)\) and random effect \(u^S({\varvec{s}}, t)\), and control how much information is shared from the average predictor towards the annual max predictor, and determine the strength of interaction between the two processes. Precisely, it allows for capturing the positive or negative correlations.

For further explanation, the sharing effects (including fixed effects and random effects) are the same in two sub-models (\(\varvec{\beta }^{S}\), \(u^{S}({\varvec{s}},t)\)). Meanwhile, they are linked by the sharing coefficients (scaling factors) \(\varvec{\beta }_{1}^{\text {Gaussian-Gumbel}}\) and \({\beta }_{2}^{\text {Gaussian-Gumbel}}\). On the one hand, these sharing coefficients relax the strictly equal relation between the sharing effects in two sub-models. On the other hand, more importantly, their posterior distributions can also measure the similarities and differences in the effects of predictors in the sub-models. For instance, a significantly negative \({\beta }_{1}^{\text {Gaussian-Gumbel}}\) implies that the corresponding predictors oppositely influence the moderate and extreme air pollution cases.

3.3 Priors definition

In a Bayesian context, in order to finalize the model, we need to define prior distributions for the remaining parameters in Gumbel (\(\sigma _{\text {Gumbel}}\)) and GEV distributions (\(\sigma _{\text {GEV}}\), \(\xi _{\text {GEV}}\)), the regression coefficients (\(\varvec{\beta }\)), the sharing coefficients in the joint model (\(\varvec{\beta }^{\text {Gaussian-Gumbel}}\)), the parameters in the Matérn covariance function (\(\sigma _{M}, \rho _{M}\), \(\nu _{M}\)) and the parameters in AR(1) dynamic model (a, \(\tau _{AR}\)).

We use vague Gaussian priors for the tail parameter (\(\xi _{\text {GEV}}\)) in GEV distribution and the elements of coefficients (\(\varvec{\beta }\), \(\varvec{\beta }^{\text {Gaussian-Gumbel}}\)). The smooth parameter \(\nu _{M}\) is treated here as a fixed value with \(\nu _{M} = 1\), as in most spatial analyses. The parameters \(\sigma _{M}\) and \(\rho _{M}\) in the Matérn function and autocorrelation parameter a in AR(1) model are defined by penalized complexity (PC) priors (Simpson et al. 2017) with knowledge from Moraga (2019) and Fuglstad et al. (2019). PC prior for the range parameter (\(\rho _{M}\)) is defined with \({\text {Prob}}\left( \rho _{M}<10^4\right) =0.01\), which means the probability that the range is less than \(10 \textrm{km}\) is very small, and the PC prior for variance parameter as \({\text {Prob}} \left( \sigma _{M}>3\right) =0.01\), indicating the probability for variance greater than 3 is low. Similarly, we apply the auto-correlation (a) PC prior following the recommendation of \({\text {Prob}}\left( a>0\right) =0.9\).

Note that INLA often uses the precision parameter (\(\tau\)) to replace the scale parameter (\(\sigma\)) by \(\sigma = {1}/{\sqrt{\tau }}\). The PC priors for all precision parameters are given by \({\text {Prob}}\left( {1}/{\sqrt{\tau }}>3\right) =0.01\) for Gumbel and GEV likelihood and \({\text {Prob}}\left( {1}/{\sqrt{\tau _{AR}}}>5\right) =0.01\) for the AR model.

3.4 Model evaluation, diagnosis and cross-validation

Traditional Bayesian model performance is evaluated by two popular criteria, the deviance information criterion (DIC) and the Watanabe-Akaike information criterion (WAIC). The deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002), is a popular criterion for model choice similar to the Akaike information criterion (AIC).

$$\begin{aligned} \text {DIC}=D( \widehat{\varvec{\theta }})+2 p_D, \end{aligned}$$

where \(D( \widehat{\varvec{\theta }})\) is the deviance function with Bayes estimate \(\widehat{\varvec{\theta }}\), and \(p_D\) is the effective number of parameters. The Watanabe-Akaike information criterion, also known as the widely applicable Bayesian information criterion, is similar to the DIC, but the effective number of parameters is computed in a different way (Watanabe 2013).

However, DIC may under-penalize complex models with many random effects. Alternatively, for prediction performance, INLA suggests applying the leave-one-out cross-validation criteria, conditional predictive ordinates (CPO; Pettit 1990) with its summative version, the Logarithmic Score (LS; Gneiting and Raftery 2007) and predictive integral transform (PIT; Marshall and Spiegelhalter 2003), which facilitates the computation of the cross-validated log-score for model choice, and enables the calibration assessment of out-of-sample predictions, respectively.

$$\begin{aligned} \begin{aligned} {\text {LS}}&= - \sum _{i=1}^{n} \ln {\text {CPO}_i} \quad \text {with\ } \textrm{CPO}_i =\pi \left( y_i^{\text{ obs } } \mid y_{-i}\right) ,\\ \text {PIT}_i&={\text {Prob}}\left( Y_i \le y_i^{\text{ obs } } \mid y_{-i}\right) , \end{aligned} \end{aligned}$$

where \(y_i^{\text{ obs } }\) denotes the i-th observation and \(y_{-i}\) denotes the observations y with the i-th component omitted. A model with a small value of LS and a standard uniform distributed PIT is preferable (Gómez Rubio 2020).

Note that numerical problems may occur when CPO and PIT values are computed with the complicated Gaussian process with INLA algorithm (Held et al. 2010). Hence, we take a few other criteria introduced in Bayesian literature into account: The coverage probability of 95% CI is computed as the proportion of the validation observations that the observed value lies between the 2.5% quantile to the 97.5% quantile of the predicted value (posterior distribution). The correlation coefficient denotes the correlation between observed values and predicted values in the validation set. The root mean square error (RMSE) is defined as the square root of the second sample moment of the differences between predicted values and observed values.

As we separate the original five-year dataset (2017–2021) into training (2017–2020) and validation (2021) sets, we decide to apply DIC, WAIC, LS, PIT and RMSE to compare the model fitness on the training set and use the coverage probability, correlation coefficient, and RMSE for predictive ability evaluation.

4 Results

In this section, we first compare the performance of the Bayesian generalised extreme models and select the best one with the outstanding predictive ability (Sect. 4.1), then summarize the corresponding posterior estimates for both fixed and random effects (Sect. 4.2). The analysis and interpretation of the joint model are included in Sect. 4.3 and followed by the excursion functions generation with severe air pollution risk ranking (Sect. 4.4). Note that since the explanatory variables are measured in different scales, to avoid numerical problems, each predictor is standardized to have mean zero and unit standard deviation.

4.1 Model comparison

In order to examine the fitting and prediction ability of the extreme value models, we apply some evaluation criteria (DIC, WAIC, LS, PIT and RMSE) on the training set (Table 3), and use other criteria (coverage probability, correlation and RMSE) on the validation set (Table 4). We see from Table 3 with model performance on the training set, Models 2 and 4 outperform in DIC (36.27 and 73.43, respectively), WAIC (193.67 and 223.06, respectively) and RMSE (0.24 and 0.18, respectively). However, Model 1 is preferable in the leave-one-out cross-validation criteria with LS (8019.07) and PIT plots (Fig. 3). Table 4 shows that all four models are comparable in the coverage probability, correlation and RMSE.

Figure 4 shows the simultaneous visualization of model performance on both training and validation sets, where the scatters’ distribution along the line with intercept 0 and slope 1 indicates the estimated (predicted) values are best suited to the observations. All four models exhibit a strong fit for the trend.

To summarise, no model excels in all criteria. Considering the similar performance of all four models on the validation set, we choose the parsimonious model, Model 1 (the Gumbel model with fixed effects and spatio-temporal random effects defined in Eq. (1)), as the best model and bring its structure into further analysis.

Table 3 Performance evaluation criteria (DIC, WAIC, LS and RMSE) of Models 1–4 specified in Eqs. (1), (3)–(5) on the training set (all criteria prefer lower values)
Fig. 3
figure 3

Predictive integral transform plots for Models 1–4 specified in Eqs. (1), (3)–(5) in ad based on training dataset. A uniform distribution pattern is preferable

Table 4 Performance evaluation criteria (coverage probability, correlation and RMSE) of Models 1–4 specified in Eqs. (1), (3)–(5) on the validation set
Fig. 4
figure 4

Visualisation of training (red) and validation (cyan) performance in ad for Models 1–4 specified in Eqs. (1), (3)–(5) subsequently. The scatters distributed along the line with the intercept 0 and the slope denote the better model

4.2 Summary of Model 1

The summary statistics for fixed effects are shown in Table 5. We see that altitude and population density are significantly negatively associated with annual maxima \({\text{PM}}_{10}\) concentrations, whereas the effects of other covariates are not statistically significant. High temperature, low precipitation and low vapour pressure are likely to associate with extreme \({\text{PM}}_{10}\) concentrations, consistent with Kalisa et al. (2018) and Li et al. (2014).

The negative association between population density and annual maxima \({\text{PM}}_{10}\) concentrations is different from the positive correlation found in Table 2 and the positive association demonstrated by Borck and Schrauth (2021). Together with the facts of relatively low correlation to population density (Table 2) and the phenomena that severe \({\text{PM}}_{10}\) pollution in 2017, 2020 and 2021 (Fig. 2) is usually associated with high temperature and low precipitation, our results probably imply that the generation and spread of extreme \({\text{PM}}_{10}\) concentrations depend heavily on the climate conditions. This evidence coincides with the findings of the overwhelming role of adverse meteorology in severe air pollution events (Wang et al. 2020; Morawska et al. 2021).

The spatial and temporal dependence is accessible by spatial heat plot (Fig. 5) and posterior estimates of autocorrelation coefficient (a in Table 6). Spatially, similar values of mean random effects occur in groups, especially a large cluster of high values happens in the centre of the mainland. Temporally, the estimated mean correlation coefficient (a) is 0.80 with 95% credible interval (0.74, 0.85), providing evidence of relatively strong dependence between two consecutive years.

Table 5 Posterior estimates (mean, standard deviation and quantiles) of the coefficients of the covariates involved in Model 1 specified in Eq. (1)
Table 6 Posterior estimates of mean, standard deviation and quantiles of the parameters in Model 1 specified in Eq. (1)
Fig. 5
figure 5

Heat map of the spatial random effect in 2017 with a mean and b standard deviation

4.3 Results of the joint model with sharing effects

Combining the best extreme value model (Model 1) with the Gaussian model, we establish the joint model in Sect. 3.2 with sharing effects to estimate both maxima and mean concentration levels simultaneously. We keep the two significant effects of predictors (altitude and precipitation) as sharing effects, and all other effects (temperature, vapour pressure, population density and location) are treated as non-sharing ones. Table 7 summarizes the posterior estimates of these effects from the joint model. Precipitation shows significant negative associations with both annual mean and maxima concentrations, which is consistent with the notion that \({\text{PM}}_{10}\) concentrations are generally low in wet regions (Li et al. 2014). In contrast, we find that altitude is negatively connected with the mean but positively connected with the maxima.

This is also supported by the detailed sharing coefficients analysed below. We see from Fig. 6 that the posterior distribution of the sharing coefficients of precipitation (see detailed definition of \(\varvec{\beta }_{1}^{\text {Gaussian-Gumbel}}\) in Sect. 3.2) almost lies between 0 and 1, showing that the influences of precipitation are similar on extreme and moderate air pollution. The coefficient for altitude is negative, suggesting certain evidence that altitude can impact reversely in different scaled pollution.

Furthermore, the effect of temperature is not significant with respect to 95% credible interval for mean but becomes significant for maxima. This result implies that high temperatures weakly impact the general air quality, while playing an essential role in the dispersion of extremely poor pollution. Nevertheless, considering the possibility of unmeasured confounders potentially distorting the coefficients, the strength of the aforementioned inference might be compromised.

In terms of model performance evaluation, the joint model demonstrated promising results in the validation set. In particular, the Gumbel sub-model (max sub-model) exhibited improvements in the coverage probability (\(88.63\%\)) and correlation (\(46.71\%\)), compared with the univariate Gumbel model (Model 1), as shown in Tables 4 and 8. The insights of the joint model performance are depicted in Fig. 7, where both the performance of the Gumbel model (maxima model) and the Gaussian model (mean model) are satisfied. Because of this, despite the less parsimonious nature of the joint model, which may introduce more volatility in the prediction of the Gumbel sub-model, the overall satisfying evaluation results bolster the credibility of the joint model’s findings. With this confidence, we proceed to utilize the joint model for subsequent predictions using excursion functions.

Table 7 Posterior estimates (mean and 95% credible interval) of the sharing effects and non-sharing effects specified in Eq. (6)
Table 8 Sub-models performance evaluation (coverage probability, correlation and RMSE) in the validation set
Fig. 6
figure 6

a The posterior distributions plot and b the quantiles plot of sharing coefficients. The three nodes in the quantiles plot indicate 0.025, 0.5 and 0.975 quantiles of the posterior estimates

Fig. 7
figure 7

a Histogram of PIT for joint model defined in Eq. (6) on training set, b the mean sub-model and c the max sub-model performance in both training and validation sets

4.4 Annual maxima prediction and excursion functions

Hot-spot region identifications and predictive concentration level plots are important tools in air pollution studies, because they intuitively imply the air quality in specific regions and are easily interpreted to environmental agencies and the general public. Bolin and Lindgren (2015) proposed the positive (\(\textrm{E}_{u, \alpha }^{+}(X)\)) and negative excursion sets (\(\textrm{E}_{u, \alpha }^{-}(X)\)) that determine the largest set that simultaneously exceeds or falls below the risk level (u) with a small error probability (\(\alpha\)), employing a parametric family and sequential importance sampling method for estimating joint probabilities. To visualize the excursion sets simultaneously, we apply the positive and negative excursion functions, \(F_u^{+}(s)=1-\inf \left\{ \alpha \mid s \in \textrm{E}_{u, \alpha }^{+}\right\}\) and \(F_u^{-}(s)=1-\inf \left\{ \alpha \mid s \in \textrm{E}_{u, \alpha }^{-}\right\}\). To explain, the term \(\inf \left\{ \alpha \mid s_0 \in \textrm{E}_{u, \alpha }^{+}\right\}\) denotes the "smallest" \(\alpha\) required that the location (\(s_0\)) can be included into the positive excursion set \(\textrm{E}_{u, \alpha }^{+}\) at the first time, while the higher \(1-\inf \left\{ \alpha \mid s_0 \in \textrm{E}_{u, \alpha }^{+}\right\}\) reported by positive excursion function generally indicates higher probabilities for the location (\(s_0\)) to exceed the risk threshold u simultaneously.

In our case, to discover the areas that are most likely and most unlikely to suffer from severe \({\text{PM}}_{10}\) pollution simultaneously, we utilize predicted \({\text{PM}}_{10}\) concentration levels from the max sub-model of the joint model to generate both positive and negative excursion functions with the thresholds 50 μg/m3 (poor) and 100 μg/m3 (very poor) at 548 locations distributed throughout the mainland of Spain, including a set of 0.5° \(\times\) 0.5° (50 km \(\times\) 50 km) grids (206 locations) and locations of all \({\text{PM}}_{10}\) stations (342 monitors).

In the case of exceeding 50 μg/m3 (Fig. 8), the probability for simultaneously exceeding is high in the northwest, middle and south, meaning that poor \({\text{PM}}_{10}\) pollution probably hazard these locations during the year. In contrast, the negative excursion function with threshold 50 μg/m3 indicates that the regions in the north and east enjoy good or moderate air quality throughout the year. In the case of 100 μg/m3 (Fig. 9), the probabilities for very poor \({\text{PM}}_{10}\) pollution occurrence are low (in white) in most regions, but still likely to appear in certain areas in the community of Madrid, meanwhile, most areas in the north, northeast and east are expected to be below this threshold.

Fig. 8
figure 8

Positive and negative excursion functions with threshold 50 μg/m3 are displayed in a and b, respectively. Annual maximum concentration levels for locations in red are likely to exceed 50 μg/m3, and concentration levels for locations in cyan are probably below the threshold

Fig. 9
figure 9

Positive and negative excursion functions with threshold 100 μg/m3 are displayed in a and b, respectively. Annual maximum concentration levels for locations in red are likely to exceed 100 μg/m3, and concentration levels for locations in cyan are probably below the threshold

5 Discussions and conclusions

In this paper, we discovered the spatio-temporal patterns of \({\text{PM}}_{10}\) concentration levels in mainland Spain under the framework of INLA-EVT methodology. The high-quality data-set of annual maxima and annual mean of daily PM\(_{10}\) concentration levels was jointly studied successfully, due to the following three reasons. Firstly, Spain is a mountainous country with a large central plateau, and this complicated orography is usually associated with a variety of climatic conditions. Secondly, the air pollution monitors distributed throughout mainland Spain provide high-quality time series data, which supports both local and national level accurate prediction and credible inference. Finally, Spain generally enjoys good air quality (low annual mean), but extreme air pollution does appear on certain days during the year (high annual maxima). This circumstance allows us to investigate the potential difference in the generation and spread of the moderate and extreme cases.

We establish a series of Bayesian spatio-temporal models on extreme \({\text{PM}}_{10}\) concentrations in Spain, with specifically meteorological predictors, human effects, and spatio-temporal random effects following SPDE and AR(1) dynamic to account for the dependence not explained by covariates. We also provide evidence of similar and reverse connections between influential predictors and different scaled \({\text{PM}}_{10}\) concentrations by the Bayesian joint model with sharing effects, as well as generate the annual maxima excursion functions maps specified at the grid level to highlight the regional risk ranking.

Although most statistical studies focus on moderate cases with long-term exposure, in the epidemiological field, short-term exposure to severe particulate matter is considered as an essential public health issue with major acute cardiovascular problems and health economic consequences. For example, Brook et al. (2016) emphasized that people living in highly polluted regions probably increase heart failure hospitalisations and cardiovascular mortality several folds. Shah et al. (2013) also pointed out that even modest improvements in air quality are projected to have major population health benefits and substantial health-care cost savings, preventing thousands of heart failure hospitalisations and saving millions of US dollars a year.

Combined with these health and economic hazards, the main findings in this paper are expected to provide in-depth knowledge of extreme air pollution spreading, promote awareness of extreme value studies, and provide suggestions to the national governments regarding the legislation of extremely poor air quality regulation and human health protection. In particular, the joint model incorporating sharing effects emphasizes the potential reverse or similar impacts of altitude and precipitation on both moderate and extreme cases of \({\text{PM}}_{10}\) pollution. Moreover, the excursion functions maps indicate that the central region in Spain is more likely to experience severe \({\text{PM}}_{10}\) pollution, which can be applied in research of long-term effects and health outcomes in epidemiological studies, such as acute cardiovascular events (Mustafić et al. 2012; Shah et al. 2013) and various types of strokes (Yu et al. 2014).

In conclusion, our study provides valuable insights into the generation and spreading of extreme PM\(_{10}\) pollution through innovative methods incorporating sharing effects in the joint model. This approach holds promise for future environmental and epidemiological studies in exploring various air pollutants and their relationship with meteorological variables and anthropogenic factors.