Spatio-temporal joint modelling on moderate and extreme air pollution in Spain

Very unhealthy air quality is consistently connected with numerous diseases. Appropriate extreme analysis and accurate predictions are in rising demand for exploring potential linked causes and for providing suggestions for the environmental agency in public policy strategy. This paper aims to model the spatial and temporal pattern of both moderate and extremely poor PM10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{10}$$\end{document} concentrations (of daily mean) collected from 342 representative monitors distributed throughout mainland Spain from 2017 to 2021. We first propose and compare a series of Bayesian hierarchical generalized extreme models of annual maxima PM10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{10}$$\end{document} concentrations, including both the fixed effect of altitude, temperature, precipitation, vapour pressure and population density, as well as the spatio-temporal random effect with the Stochastic Partial Differential Equation (SPDE) approach and a lag-one dynamic auto-regressive component (AR(1)). Under WAIC, DIC and other criteria, the best model is selected with good predictive ability based on the first four-year data (2017–2020) for training and the last-year data (2021) for testing. We bring the structure of the best model to establish the joint Bayesian model of annual mean and annual maxima PM10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{10}$$\end{document} concentrations and provide evidence that certain predictors (precipitation, vapour pressure and population density) influence comparably while the other predictors (altitude and temperature) impact reversely in the different scaled PM10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{10}$$\end{document} concentrations. The findings are applied to identify the hot-spot regions with poor air quality using excursion functions specified at the grid level. It suggests that the community of Madrid and some sites in northwestern and southern Spain are likely to be exposed to severe air pollution, simultaneously exceeding the warning risk threshold.


Introduction
Air pollution, composed of particulate matter (PM) and gaseous pollutants, has a substantial negative impact on the environment, ecosystem and human health.Poor air quality is one of the five most significant health risks worldwide, alongside high blood pressure, smoking, diabetes and obesity (Cohen et al., 2017;Daellenbach et al., 2020).It becomes one of the most considerable health concerns for the residents in areas of higher population density (Dias and Tchepel, 2018), centres with dense activities, and to particular user groups (Agarwal and Kaddoura, 2019;Singh et al., 2021).Among all the pollutants, the particulate matter, with an aerodynamic diameter less than or equal to 10 and 2.5 microns respectively (PM 10 and PM 2.5 ) are most consistently connected with numerous adverse health outcomes including lung infections, cardiovascular diseases, and respiratory problems (Joseph et al., 2003;Martuzzi et al., 2006;Samoli et al., 2013), while appropriate regulation directly reduces the adverse health effects, increases general well-being, and improves public health (Steinle et al., 2015).
In Europe, although the European Environment Agency (EEA) maintains a rather dense particulate matter monitoring network to record the concentration levels across countries, huge regions of the European continent remain unmonitored.For proper assessment of population-wide exposure and appropriate formulation of pollution mitigation strategies, the responsible authorities need accurately estimate and predict the concentration levels at the unobserved locations (Chu et al., 2015).
The main challenge to forecasting particulate matter concentrations corresponds to the complexity of PM generation and spreading dynamics.On the one hand, the PM generation is dominated by two complicated sources, inorganic aerosol coming from the agriculture, long-range transport and energy sectors (Steinle et al., 2013;Daellenbach et al., 2020), as well as organic aerosol coming from biomass and fossil fuels burning emissions, vehicles emissions and cooking (Lenschow et al., 2001;Omidvarborna et al., 2015).On the other hand, the PM spread depends on both meteorological conditions and land use dispersion, leading the observed concentration levels to fluctuate geographically and temporally.
In the statistical literature, the Bayesian spatio-temporal model that allows modelling a complex environmental phenomenon through a hierarchy of sub-models becomes one of the most promising methodologies in air quality scientific investigations (Cameletti et al., 2013;Amin et al., 2015;Taheri Shahraiyni and Sodoudi, 2016;Forlani et al., 2020;Fioravanti et al., 2021;Castro-Camilo et al., 2021).In particular, this approach allows involving the explanatory variables to explain the large-scale variability, take residual dependency into account through a space-time process with a Gaussian random field (GRF), and produce high-resolution spatial forecasts to meet the rising demand for predictive concentration maps in epidemiological studies.
According to Porcu et al. (2012), the main drawback of the Bayesian model with a GRF refers to the computational difficulty to deal with enormous amounts of data, especially applying complex spatial dependence measures (i.e., the Matérn covariance function).Some strategies have been proposed to alleviate the computational burden of fitting complex spatial and temporal models.Lindgren et al. (2011) proposed the stochastic partial differential equation (SPDE) approach, providing a method to represent a continuous Matérn field through a discretely indexed Gaussian Markov random field (GMRF) associated with a sparse precision matrix, which enjoys good computational property.Rue et al. (2009) also provided the Integrated Nested Laplace Approximation (INLA) algorithm that performs direct numerical calculations on the marginal posterior distributions, avoiding the time-consuming Markov chain Monte Carlo (MCMC) simulations.Additionally, GMRF with SPDE approach can be fitted in a Bayesian hierarchical framework through the INLA approach, with implementation in the R-INLA package available at https://www.r-inla.org/,making this methodology fast and easily implemented.
Most previous spatial and temporal studies on air pollution only concentrated on moderate (i.e., daily, monthly and annual mean) PM concentrations (Cameletti et al., 2013;Beloconi et al., 2018;Fioravanti et al., 2021;Saez and Barceló, 2022).However, extreme conditions are actually more concerned with environmental quality management due to their various hazardous impacts (Amin et al., 2015).Numerous epidemiological studies pointed out that shortterm exposures to severe particulate matter pollution can trigger serious acute cardiovascular and respiratory mortality (Orellano et al., 2020;Lei et al., 2019;Zhang et al., 2019;Yu et al., 2014;Brook et al., 2016) and huge economic loss in the corresponding hospitalization (Xie et al., 2021;Shah et al., 2013).In the field of extreme case spatiotemporal analysis, Sharma et al. (2012); Rodríguez et al. (2016); Amin et al. (2015); Martins et al. (2017) and Castro-Camilo et al. (2021) typically focused on a small spatial domain, making it difficult to consider the complicated orography with a variety of climatic conditions, as well as to provide general suggestions to national governments on the environmental policy formulation and health care allocation.More importantly, to our best knowledge, no studies consider the potential difference between moderate and extreme air pollution, in other words, model different scaled air pollution simultaneously to identify similarities and differences in the effects of influential factors.
In this paper, we focus on the spatial and temporal variation of both moderate and extreme air pollution (i.e., annual mean and annual maxima of daily PM 10 concentration levels) in Spain from 2017 to 2021, after controlling for meteorological variables and socio-economic factors.The contribution is two-fold, the predictions of extreme pollution (excursion functions) and the investigation of similar/reverse effects of predictors in different scaled cases.Firstly, we establish several Bayesian hierarchical generalized extreme models on annual maxima and select the best model based on their predictive performance, detailed in Sections 3.1 and 4.1.Secondly, we utilize the joint Bayesian model with sharing effects in Sections 3.2 and 4.3 to model both annual mean and maxima concentrations simultaneously.We observe the comparable influence of precipitation, vapour pressure, and population density, as well as the possible opposite effects of altitude and temperature.We also generate excursion function maps (Bolin and Lindgren, 2015) based on the joint model to highlight the regional risk ranking that simultaneously exceeds the warning risk threshold in Section 4.4.These main findings, comprehensive knowledge on the PM 10 generation and spread with high-resolution spatial forecasts are expected to promote awareness of the significance of extreme air pollution research, help in the investigation of the long-term effect in epidemiological studies, and underpins air pollutants regulation and human health protection strategies for environmental agencies.
The rest of the paper is organized as follows.In Section 2, we demonstrate the dataset for response and main explanatory variables.In Section 3, we formalize spatio-temporal Bayesian generalized extreme models on moderate-extreme air pollution in the framework of Bayesian spatial analysis and extreme value theory.We present the main results, applications and potential influence in Section 4, and conclude this paper with extensional discussions in Section 5.

Data 2.1 PM 10 concentrations and spatial domain
In mainland Spain, the PM 10 concentration levels data is accessible by the Air Quality e-Reporting which is EEA's air quality database consisting of a multi-annual time series data of air quality measurement and calculated statistics for a number of air pollutants.To work with a more robust dataset, we only retain 342 air pollution stations that have at least 60% valid observations in a year, with the geographical distribution of the stations shown in red circles embedded in the mesh constructed for the SPDE approach (Figure 1).This five-year dataset (2017-2021) with 1470 observations is divided into the training set (2017-2020; 1215 observations) and the validation set (2021; 255 observations), which are used to evaluate and compare the model fitness and predictive ability.Figure 2 shows the temporal and spatial variations of extreme PM 10 following the EEA's air quality categories with 0-20µg/m 3 (Good), 20-40µg/m 3 (Fair), 40-50µg/m 3 (Moderate), 50-100µg/m 3 (Poor), 100-150µg/m 3 (Very poor), more than 150µg/m 3 (Extremely poor).Temporally, severe pollution seems to occur in 2017 and 2020, as indicated by the numerous monitors coloured in red (very poor) and purple (extremely poor).Spatially, compared with relatively low annual maxima recorded in the east (Valencian Community) and north (Basque Country), high

Explanatory variables
A number of potential predictors are available based on prior findings in the air quality literature (Cameletti et al., 2013;Fioravanti et al., 2021;Castro-Camilo et al., 2021), and we choose to include a set of five main spatial and spatio-temporal varying predictors with the complete description list reported in Table 1.
In the following, we describe the selected predictors in detail.
Meteorological variables.The meteorological variables (temperature, precipitation and vapour pressure) of the monthly mean are collected from the CRU TS (Climatic Research Unit gridded Time Series; Harris et al., 2020) dataset and aggregated to be annual mean.Accordingly, CRU TS was first published in 2000, using ADW (angulardistance weighting) to interpolate anomalies of monthly observations onto a 0.5°grid over land surfaces (excluding Antarctica) for observed and derived variables (mean, minimum and maximum temperatures, precipitation, vapour pressure, wet days and cloud cover) with no missing values in the defined domain.
Elevation.The altitude data, height over sea level, matched with locations of all air pollution monitors, is accessible in the annual aggregated air quality values dataset provided by EEA, available at https://discomap.eea.europa.eu/App/AirQualityStatistics/index.html.
Population density.The population densities are calculated in each autonomous community.The original data is collected from the statistics report (available at https://stats.oecd.org/) of the Organisation for Economic Cooperation and Development (OECD).The OECD statistics contain data and metadata for economic and education indexes of OECD countries and some selected non-member economies.
We see in Table 2 that the correlations of potential predictors to extreme and average PM 10 display a similar or different direction, which also vary year by year.This will be further investigated by our spatio-temporal generalized extreme model and joint model with sharing effects, see details in both Sections 3.1 and 3.2.Furthermore, considering the correlation between location variables and meteorological variables (e.g., latitude and temperature), we adjust the location variables (longitude and latitude) as covariates to investigate the impact of meteorological factors.

Model Formulation
In Section 3.1, we provide first a brief introduction to spatio-temporal Gaussian field and extreme value models, followed by four candidate models for the annual maximum of daily PM 10 in mainland Spain.Subsequently, in Section 3.2, we present the joint Bayesian Gumbel-Gaussian model for both extreme and moderate levels.

Spatio-temporal extreme value models with mixed effects
The generalized extreme value (GEV) distribution is widely employed for modelling extremes in environmental science (Reiss and Thomas, 2007), such as temperature (Cheng et al., 2014), precipitation (Panagoulia et al., 2014), air pollution (Deng and Zhang, 2018;Martins et al., 2017) and sea level (Lobeto et al., 2018).The GEV distribution has three parameters, location parameter (µ, −∞ < µ < ∞), scale parameter (σ, σ > 0) and tail parameter (ξ, −∞ < ξ < ∞) with the cumulative distribution function The case ξ = 0 is interpreted as the limit case of ξ → 0, leading to the Gumbel family with distribution function Considering the potential spatial and temporal dependence among the PM 10 concentrations, we use two typical approaches for measurement: Matérn covariance function (Matérn, 1986;Guttorp and Gneiting, 2006) with the Stochastic Partial Differential Equations approach (SPDE; Lindgren et al., 2011) approximation for spatial correlation and the auto-regressive dynamic model (AR; Shumway and Stoffer, 2011) for temporal dependence.We establish two groups of generalised extreme value models with similar fixed effects and varying random effects to model the annual maxima PM 10 concentrations.
Model 1: Gumbel model with fixed effect and spatio-temporal random effect.Let y max (s, t) denote the logarithm transform of annual maxima PM 10 concentrations at location s ∈ S and year t ∈ T , where S is the study area and T is the time period in focus.Under the assumption of constant scale (σ) and tail (ξ) parameters, we use a linear combination of fixed effects with explanatory variables and spatio-temporal varying random effect to model the location parameter (µ(s, t)) in Gumbel model below.Suppose that Here, the fixed effects are associated with the vector x(s, t) including an intercept and the explanatory variables of location variables, meteorological variables and human-effect variables listed in Table 1, and the vector β corresponds to the regression coefficients associated with the fixed effects.The term u(s, t) represents a spatio-temporal varying random effect that incorporates spatio-temporal interaction (Cameletti et al., 2013).It temporally changes according to AR(1) dynamics with autocorrelation parameter a and spatial correlated and serially independent innovations w (s, t).
Given two locations s i and s j separated by h = d(s i , s j ) (normally Euclidean) units, the Gaussian process w(s, t) with mean 0 and Matérn covariance function is in the form of where Γ is the gamma function, K ν is the modified Bessel function of the second kind, ρ > 0 is the range parameter, ν > 0 is the smoothness parameter, and σ 2 > 0 for the marginal variance.
Model 2: Gumbel model with fixed effect and separated spatial and temporal random effects.Our second model is a modification of Model 1 specified in Eq.( 1) with spatial and temporal random effects and separable interaction effects specified in Eq.( 2), namely, we keep the Gumbel distribution assumption on y max (s, t) as below. [ (3) Note that u(s, t) is defined the same as in Eq.( 2), indicating the spatio-temporal interaction term with the Kronecker product, see e.g., Cameletti et al. (2011Cameletti et al. ( , 2013)); Fioravanti et al. (2021).The f (t) denotes the non-linear random effect in the temporal structure of AR(1), the w(s, t) is the spatially dependent only random effect with SPDE structure.Specifically, the implementation of the AR(1) model in INLA generally assumes the Gaussian white noise with mean 0 and precision τ AR .For f (t) defined over the naturally binned covariate (Year), let t = (t 1 , . . ., t 5 ) ⊤ denotes the time from the first year (t 1 ) to the last year (t 5 ), AR , i = 2, . . ., 5, where −1 < a < 1 is a numeric constant, the so-called autocorrelation, by which we multiply the lagged variable f (t i−1 ), and ϵ i denotes the unpredictable error in the form of Gaussian white noise.
Model 3: GEV model with fixed effect and spatio-temporal random effect.The generalised extreme value (GEV) models are basically following the same structure as Gumbel models except the generalized extreme value distribution for the response.To be specific, we suppose that where the location parameter µ(s, t) is of the same form of mixed effects as in Eq.( 1) and random effects in Eq. (2).
Model 4: GEV model with fixed effect and separated spatial and temporal random effect.Similar consideration of Model 2 modified from Model 1, we consider the following model in parallel with Model 3, i.e., we take spatial and temporal random effects with an interaction effect into the GEV model.Suppose that with the location parameter µ(s, t) is of spatio-temporal structure of the form in Eq.(3).
It is worth pointing out that the well-known limit type theorem in Coles (2001) motivates our GEV distribution assumption of annual maxima of PM 10 , and its reduced case (i.e., Gumbel corresponds to GEV with ξ = 0).The latter one becomes more parsimonious, and its exponential decay tail might be appropriate to fit extreme air quality (Deng and Zhang, 2018).The Gumbel model is generally needed when the uncertainty of ξ being non-zero arises in the GEV model with an estimate of ξ close to zero.

Bayesian joint model with sharing effects
In order to identify the potential varied effect levels of main explanatory variables, with inspiration from applications of the joint model with sharing effects on wildfire (Koh et al., 2021), we model both moderate and extreme PM 10 pollution simultaneously in two respective sub-models linked by the sharing effects and the sharing coefficients (scaling factors).
Let y mean (s, t) denote the logarithm transform of annual mean PM 10 at location s ∈ S and year t ∈ T .We perform the Gaussian sub-model and Gumbel sub-model on annual mean and annual maxima simultaneously, with the structure of the best extreme value model as Model 1 according to the model fitness and prediction analysed in Section 4.1.
where the terms β S and u S (s, t) with superscript S denote the sharing effects, and x S (s, t) are corresponding variables (geographical and meteorological).The term β NS with superscript N S denotes the non-sharing effects with corresponding covariates x NS (s, t) which includes the intercept, location variables and human-effect variables (longitude, latitude, population density).For the selection of sharing and non-sharing variables, to avoid the potential issue of uncertainty of significant sharing effects (β S = 0), we treat two significant predictors (altitude and precipitation) as sharing terms to investigate the potential similar and reverse effects by the ratios (β Gaussian-Gumbel

1
) while considering all other variables, including three non-significant ones (temperature, vapour pressure, and population density), and location variables as non-sharing terms.
Additionally, the spatio-temporal random effect is also treated as sharing effects with respect to the simplicity of computation.The sharing effects β Gaussian-Gumbel 1 and β Gaussian-Gumbel 2 scale the common components of covariate vector x S (s, t) and random effect u S (s, t), and control how much information is shared from the average predictor towards the annual max predictor, and determine the strength of interaction between the two processes.Precisely, it allows for capturing the positive or negative correlations.For further explanation, the sharing effects (including fixed effects and random effects) are the same in two submodels (β S , u S (s, t)).Meanwhile, they are linked by the sharing coefficients (scaling factors) β Gaussian-Gumbel 1 and β Gaussian-Gumbel 2 .On the one hand, these sharing coefficients relax the strictly equal relation between the sharing effects in two sub-models.On the other hand, more importantly, their posterior distributions can also measure the similarities and differences in the effects of predictors in the sub-models.For instance, a significantly negative β Gaussian-Gumbel 1 implies that the corresponding predictors oppositely influence the moderate and extreme air pollution cases.

Priors definition
In a Bayesian context, in order to finalize the model, we need to define prior distributions for the remaining parameters in Gumbel (σ Gumbel ) and GEV distributions (σ GEV , ξ GEV ), the regression coefficients (β), the sharing coefficients in the joint model (β Gaussian-Gumbel ), parameters in the Matérn covariance function (σ M , ρ M , ν M ) and the parameters in AR(1) dynamic model (a, τ AR ).
We use vague Gaussian priors for the tail parameter (ξ GEV ) in GEV distribution and the elements of coefficients (β, β Gaussian-Gumbel ).The smooth parameter ν M is treated here as a fixed value with ν M = 1, as in most spatial analyses.The parameters σ M and ρ M in the Matérn function and autocorrelation parameter a in AR(1) model are defined by penalized complexity (PC) priors (Simpson et al., 2017) with knowledge from Moraga (2019) and Fuglstad et al. (2019).PC prior for the range parameter (ρ M ) is defined with Prob ρ M < 10 4 = 0.01, which means the probability that the range is less than 10km is very small, and the PC prior for variance parameter as Prob (σ M > 3) = 0.01, indicating the probability for variance greater than 3 is low.Similarly, we apply the auto-correlation (a) PC prior following the recommendation of Prob (a > 0) = 0.9.

Model evaluation, diagnosis and cross-validation
Traditional Bayesian model performance is evaluated by two popular criteria, the deviance information criterion (DIC) and the Watanabe-Akaike information criterion (WAIC).The deviance information criterion (DIC) proposed by Spiegelhalter et al. (2002), is a popular criterion for model choice similar to the Akaike information criterion (AIC).DIC = D( θ) + 2p D , where D( θ) is the deviance function with Bayes estimate θ, and p D is the effective number of parameters.The Watanabe-Akaike information criterion, also known as the widely applicable Bayesian information criterion, is similar to the DIC, but the effective number of parameters is computed in a different way (Watanabe, 2013).
However, DIC may under-penalize complex models with many random effects.Alternatively, for prediction performance, INLA suggests applying the leave-one-out cross-validation criteria, conditional predictive ordinates (CPO; Pettit, 1990) with its summative version, the Logarithmic Score (LS; Gneiting and Raftery, 2007) and predictive integral transform (PIT; Marshall and Spiegelhalter, 2003), which facilitates the computation of the cross-validated log-score for model choice, and enables the calibration assessment of out-of-sample predictions, respectively.
where y obs i denotes the i-th observation and y −i denotes the observations y with the i-th component omitted.A model with a small value of LS and a standard uniform distributed PIT is preferable (Gómez Rubio, 2020).
Note that numerical problems may occur when CPO and PIT values are computed with the complicated Gaussian process with INLA algorithm (Held et al., 2010).Hence, we take a few other criteria introduced in Bayesian literature into account: The coverage probability of 95% CI is computed as the proportion of the validation observations that the observed value lies between the 2.5% quantile to the 97.5% quantile of the predicted value (posterior distribution).The correlation coefficient denotes the correlation between observed values and predicted values in the validation set.The root mean square error (RMSE) is defined as the square root of the second sample moment of the differences between predicted values and observed values.
As we separate the original five-year dataset (2017-2021) into training (2017-2020) and validation ( 2021) sets, we decide to apply DIC, WAIC, LS, PIT and RMSE to compare the model fitness on the training set and use the coverage probability, correlation coefficient, and RMSE for predictive ability evaluation.

Results
In this section, we firstly compare the performance of the Bayesian generalised extreme models and select the best one with the outstanding predictive ability (Section 4.1), then summarize the corresponding posterior estimates for both fixed and random effects (Section 4.2).The analysis and interpretation of the joint model are included in Section 4.3 and followed by the excursion functions generation with severe air pollution risk ranking (Section 4.4).Note that since the explanatory variables are measured in different scales, to avoid numerical problems, each predictor is standardized to have mean zero and unit standard deviation.

Model comparison
In order to examine the fitting and prediction ability of the extreme value models, we apply some evaluation criteria (DIC, WAIC, LS, PIT and RMSE) on the training set (Table 3), and use other criteria (coverage probability, correlation and RMSE) on the validation set (Table 4).We see from Table 3 with model performance on the training set, Models 2 and 4 outperform in DIC (36.27 and 73.43,respectively), WAIC (193.67 and 223.06,respectively) and RMSE (0.24 and 0.18, respectively).However, Model 1 is preferable in the leave-one-out cross-validation criteria with LS (8019.07)and PIT plots (Figure 3).Table 4 shows that all four models are comparable in the coverage probability, correlation and RMSE.
Figure 4 shows the simultaneous visualization of model performance on both training and validation sets, where the scatters' distribution along the line with intercept 0 and slope 1 indicates the estimated (predicted) values are best suited to the observations.All four models exhibit a strong fit for the trend.
To summarise, no model excels in all criteria.Considering the similar performance of all four models on the validation set, we choose the parsimonious model, Model 1 (the Gumbel model with fixed effects and spatio-temporal random effects defined in Eq.( 1)), as the best model and bring its structure into further analysis.(1), ( 3)-( 5) subsequently.The scatters distributed along the line with the intercept 0 and the slope denote the better model.

Summary of Model 1
The summary statistics for fixed effects are shown in Table 5.We see that altitude and population density are significantly negatively associated with annual maxima PM 10 concentrations, whereas the effects of other covariates are not statistically significant.High temperature, low precipitation and low vapour pressure are likely to associate with extreme PM 10 concentrations, consistent with Kalisa et al. (2018) and Li et al. (2014).
The negative association between population density and annual maxima PM 10 concentrations is different from the positive correlation found in Table 2 and the positive association demonstrated by Borck and Schrauth (2021).Together with the facts of relatively low correlation for population density (Table 2) and the phenomena that severe PM 10 pollution in 2017, 2020 and 2021 (Figure 2) is usually associated with high temperature and low precipitation, our results probably imply that the generation and spread of extreme PM 10 concentrations depend heavily on the climate conditions.This evidence coincides with the findings of the overwhelming role of adverse meteorology in severe air pollution events (Wang et al., 2020;Morawska et al., 2021).
The spatial and temporal dependence is accessible by spatial heat plot (Figure 5) and posterior estimates of autocorrelation coefficient (a in Table 6).Spatially, similar values of mean random effects occur in groups, especially, a large cluster of high values happens in the centre of the mainland.Temporally, the estimated mean correlation coefficient (a) is 0.80 with 95% credible interval (0.74, 0.85), providing evidence of relatively strong dependence between two consecutive years.

Results of the joint model with sharing effects
Combining the best extreme value model (Model 1) with the Gaussian model, we establish the joint model in Section 3.2 with sharing effects to estimate both maxima and mean concentration levels simultaneously.We keep the two significant effects of predictors (altitude and precipitation) as sharing effects, and all other effects (temperature, vapour pressure, population density and location) are treated as non-sharing ones.Table 7 summarizes the posterior estimates of these effects from the joint model.Precipitation shows significant negative associations with both annual mean and maxima concentrations, which is consistent with the notion that PM 10 concentrations are generally low in wet regions  (Li et al., 2014).In contrast, we find that altitude is negatively connected with the mean but positively connected with the maxima.This is also supported by the detailed sharing coefficients analysed below.We see from Figure 6 that the posterior distribution of the sharing coefficients of precipitation (see detailed definition of β Gaussian-Gumbel 1 in Section 3.2) almost lies between 0 and 1, showing that the influences of precipitation are similar on extreme and moderate air pollution.The coefficient for altitude is negative, suggesting certain evidence that altitude can impact reversely in different scaled pollution.
Furthermore, the effect of temperature is not significant with respect to 95% credible interval for mean but becomes significant for maxima.This result implies that high temperatures weakly impact the general air quality, while playing an essential role in the dispersion of extremely poor pollution.Nevertheless, considering the possibility of unmeasured confounders potentially distorting the coefficients, the strength of the aforementioned inference might be compromised.
In terms of model performance evaluation, the joint model demonstrated promising results in the validation set.In particular, the Gumbel sub-model (max sub-model) exhibited improvements in the coverage probability (88.63%) and correlation (46.71%), compared with the univariate Gumbel model (Model 1), as shown in Tables 4 and 8.The insights of the joint model performance are depicted in Figure 7, where both the performance of the Gumbel model (maxima model) and the Gaussian model (mean model) are satisfied.Because of this, despite the less parsimonious nature of the joint model, which may introduce more volatility in the prediction of the Gumbel sub-model, the overall satisfying evaluation results bolster the credibility of the joint model's findings.With this confidence, we proceed to utilize the joint model for subsequent predictions using excursion functions.

Annual maxima prediction and excursion functions
Hot-spot region identifications and predictive concentration level plots are important tools in air pollution studies, because they intuitively imply the air quality in specific regions and are easily interpreted to environmental agencies and the general public.Bolin and Lindgren (2015) proposed the positive (E + u,α (X)) and negative excursion sets (E − u,α (X)) that determine the largest set that simultaneously exceeds or below the risk level (u) with a small error probability (α), employing a parametric family and sequential importance sampling method for estimating joint probabilities.To visualize the excursion sets simultaneously, we apply the positive and negative excursion functions, denotes the "smallest" α required that the location (s 0 ) can be included into the positive excursion set E + u,α at the first time, while the higher 1 − inf α | s 0 ∈ E + u,α reported by positive excursion function generally indicates higher probabilities for the location (s 0 ) to exceed the risk threshold u simultaneously.In our case, to discover the areas that are most likely and most unlikely to suffer from severe PM 10 pollution simultaneously, we utilize predicted PM 10 concentration levels from the max sub-model of the joint model to generate both positive and negative excursion functions with the thresholds 50µg/m 3 (poor) and 100µg/m 3 (very poor) at 548 locations distributed throughout the mainland of Spain, including a set of 0.5°× 0.5°(50km × 50km) grids (206 locations) and locations of all PM 10 stations (342 monitors).
In the case of exceeding 50µg/m 3 (Figure 8), the probability for simultaneously exceeding is high in the northwest, middle, and south, meaning that poor PM 10 pollution probably hazard these locations during the year.In contrast, the negative excursion function with threshold 50µg/m 3 indicates that the regions in the north and east enjoy good or moderate air quality throughout the whole year.In the case of 100µg/m 3 (Figure 9), the probabilities for very poor PM 10 pollution occurrence are low (in white) in most regions, but still likely to appear in certain areas in the community of Madrid, meanwhile, most areas in the north, northeast and east are expected to be below this threshold.

Discussions and Conclusions
In this paper, we discovered the spatio-temporal patterns of PM 10 concentrations levels in mainland Spain under the framework of INLA-EVT methodology.The high-quality data-set of annual maxima and annual mean of daily PM 10 concentration levels was jointly studied successfully, due to the following three reasons.Firstly, Spain is a mountainous country with a large central plateau, and this complicated orography is usually associated with a variety of climatic conditions.Secondly, the air pollution monitors distributed throughout mainland Spain provide high-quality time  series data, which supports both local and national level accurate prediction and credible inference.Finally, Spain generally enjoys good air quality (low annual mean), but extreme air pollution does appear on certain days during the year (high annual maxima).This circumstance allows us to investigate the potential difference in the generation and spread of the moderate and extreme cases.
We establish a series of Bayesian spatio-temporal models on extreme PM 10 concentrations in Spain, with specifically meteorological predictors, human effects, and spatio-temporal random effects following SPDE and AR(1) dynamic to account for the dependence not explained by covariates.We also provide evidence of similar and reverse connections between influential predictors and different scaled PM 10 concentrations by the Bayesian joint model with sharing effects, as well as generate the annual maxima excursion functions maps specified at the grid level to highlight the regional risk ranking.
Although most statistical studies focus on moderate cases with long-term exposure, in the epidemiological field, shortterm exposure to severe particulate matter is considered as an essential public health issue with major acute cardiovascular problems and health economic consequences.For example, Brook et al. (2016) emphasized that people living in highly polluted regions probably increase heart failure hospitalisations and cardiovascular mortality several folds.Shah et al. (2013) also pointed out that even modest improvements in air quality are projected to have major population health benefits and substantial health-care cost savings, preventing thousands of heart failure hospitalisations and saving millions of US dollars a year.Combined with these health and economic hazards, the main findings in this paper are expected to provide in-depth knowledge of extreme air pollution spreading, promote awareness of extreme value studies, and provide suggestions to the national governments regarding the legislation of extremely poor air quality regulation and human health protection.In particular, the joint model incorporating sharing effects emphasizes the potential reverse or similar impacts of altitude and precipitation on both moderate and extreme cases of PM 10 pollution.Moreover, the excursion functions maps indicate that the central region in Spain is more likely to experience severe PM 10 pollution, which can be applied in research of long-term effects and health outcomes in epidemiological studies, such as acute cardiovascular events (Mustafić et al., 2012;Shah et al., 2013) and various types of strokes (Yu et al., 2014).
In conclusion, our study provides valuable insights into the generation and spreading of extreme PM 10 pollution through innovative methods incorporating sharing effects in the joint model.This approach holds promise for future environmental and epidemiological studies in exploring various air pollutants and their relationship with meteorological variables and anthropogenic factors.

Figure 1 :
Figure 1: Study domain together with the spatial distribution of the 342 monitoring sites in red circles.The figure also illustrates the mesh used to build the SPDE approximation to the continuous Matérn field.

Figure 2 :
Figure 2: Spatio-temporal patterns of annual maxima PM 10 concentration levels in year 2017-2021 throughout mainland Spain.The annual data is reported by European Environment Agency (EEA) and shown in the heat map with EEA's air quality category.

Figure 5 :
Figure 5: Heat map of the spatial random effect in 2017 with (a) mean and (b) standard deviation.

Figure 6 :
Figure 6: (a) The posterior distributions plot and (b) the quantiles plot of sharing coefficients.The three nodes in the quantiles plot indicate 0.025, 0.5 and 0.975 quantiles of the posterior estimates.

Figure 7 :
Figure 7: (a) Histogram of PIT for joint model defined in Eq.(6) on training set, (b) the mean sub-model and (c) the max sub-model performance in both training and validation sets.

Figure 8 :
Figure 8: Positive and negative excursion functions with threshold 50µg/m 3 are displayed in (a) and (b), respectively.Annual maximum concentration levels for locations in red are likely to exceed 50µg/m 3 , and concentration levels for locations in cyan are probably below the threshold.

Table 1 :
Description for explanatory variables.PM 10 concentrations are most prevalent in the centre, northwest and southeast, which correspond to the autonomous communities of Madrid, Galicia, Andalusia, and the Region of Murcia, respectively.This spatio-temporal pattern inspires our further investigation of spatio-temporal modelling taking appropriate topography covariates into account, which are stated in the following section.

Table 2 :
Correlation between explanatory variables and annual mean and annual maximum PM 10 on log scale.

Table 5 :
Posterior estimates (mean, standard deviation and quantiles) of the coefficients of the covariates involved in Model 1 specified in Eq.(1).

Table 6 :
Posterior estimates of mean, standard deviation and quantiles of the parameters in Model 1 specified in Eq.(1).The precision (τ G ) for Gumbel distribution, the range parameter (ρ M ) and the standard deviation (σ M ) introduced in Matérn covariance function and the autocorrelation coefficient (a).

Table 7 :
Posterior estimates (mean and 95% credible interval) of the sharing effects and non-sharing effects specified in Eq.(6).

Table 8 :
Sub-models performance evaluation (coverage probability, correlation and RMSE) in the validation set.