Background

Several studies have documented large spatial contrasts in air pollution in European and US cities [14]. Land Use Regression (LUR) modeling has become a popular method for explaining the observed contrasts [510], as well as estimating outdoor pollution concentrations at the homes of participants of large epidemiological studies [1114]. LUR relies on a spatially dense air pollution monitoring network, of which each site is characterized by a set of potential “predictor variables”, which are generally derived from Geographic Information Systems (GIS) [7]. In LUR, a regression model is developed which links the air pollution concentrations observed in the network to the most predictive environmental characteristics, such as traffic, land use and population. Depending on the pollutant, LUR modeling has been able to explain a moderate to large amount of spatial variability in concentration for a growing arsenal of pollutants.

Where traditionally, mostly nitrogen dioxides and particulate matter were modeled [7], a few recent LUR studies have also modeled ultrafine particles [1521], mostly based on mobile [17, 20] or short period monitoring [16, 18, 19, 21], and therefore modeling concentrations which are not necessarily representative for the longer-term average. Long-term, spatially resolved monitoring campaigns for ultrafine particles (UFP) have been uncommon because the condensation particle counters which are typically used for measurements are costly and require daily maintenance. The MiniDisc devices used in our study could be deployed for longer periods with relatively little maintenance.

Typically, LUR models are developed for single cities, metropolitan areas or regions, and applied within the same geographical parameters. Some attempts have been presented to combine study areas [6, 10, 2224], or transfer LUR models between areas [2426]. Most studies conclude that locally developed models are favorable over combined-area or transferred models [10, 2426]. However, it has been mentioned that combined-area models may allow epidemiological studies to pool epidemiological data from different areas and better exploit the between area contrasts, which would substantially increase the exposure range for some pollutants [22]. The background concentration difference between areas is often mentioned as a source of over- or under-prediction when applying these combined-area models, and has been addressed by including indicator variables [6], area-specific regional background [22] or recently satellite-observed background NO2 [23], or (like in this paper) larger scale dispersion models [10, 27]. Data for the latter two approaches may not be available for all pollutants, but these approaches have the advantage of being able to interpolate between study areas. The availability of common-source predictor data is crucial to the success of transferring LUR models or combining study areas [22, 25, 26]. If the allocation of land use categories differs between areas, or traffic intensity dataare obtained from different traffic models, this may compromise the quality of a combined-area model [10] or the ability to transfer models between areas [25].

This paper describes the development, performance and validation of multi-area LUR models for nitrogen dioxide (NO2), particulate matter <2.5 μm (PM2.5), PM2.5 filter absorbance, a marker for diesel exhaust particles (PM2.5 absorbance), particulate matter <10 μm (PM10), the coarse fraction calculated as the difference between PM10 and PM2.5 (PMcoarse), particle number concentration (PNC) and the lung deposited surface area (LDSA) of particles. For NO2, measurements and models cover all eight areas of the Swiss Study on Air Pollution and Lung and heart Diseases (SAPALDIA). For particle-related markers of air pollution, the study covers four of the eight areas. This paper presents one of the first LUR models representing long-term exposure to PNC and the first model for LDSA of ultrafine particles. Several toxicological studies suggest that the strength of the induced inflammatory response is related to the contact surface area of the pollutant with the alveolar cells, rather than its mass [28, 29], making it a biologically promising metric of exposure [30]. Moreover, the small size of ultrafine particles facilitates translocation into the bloodstream and uptake into cells [31]. However, due to the lack of exposure data, no epidemiological study on long-term effects of air pollution has evaluated the added value of its use. Furthermore, we evaluate the performance of study area-specific and multi-area NO2 models to predict concentrations of an independent validation dataset. The LUR models presented in this paper will be applied to the SAPALDIA study population to estimate subjects’ exposure to the above pollutants. The SAPALDIA cohort was initiated in 1991 and had follow-ups in 2002 and 2011 (SAPALDIA 3) [32, 33].

Methods

We developed LUR models for biannual average concentrations of NO2 for eight areas in Switzerland, and for PM2.5, PM2.5 absorbance, PM10, PMcoarse, PNC and LDSA for four areas, due to financial constraints, using a range of geographic predictor variables. NO2 models were developed for all eight areas combined, for each area separately, and for the alpine and non-alpine regions separately. All predictors were available nationwide. We used a supervised stepwise method to develop LUR models, maximizing model explained variance, while allowing only “plausible” directions of effect (e.g. increase in concentration for traffic, decrease in concentration for proximity to green space).

Air pollution measurement data

The SAPALDIA 3 measurement campaign and results have been described previously [34]. Briefly, eight study areas (Aarau, Basel, Davos, Geneva, Lugano, Montana, Payerne and Wald) were selected to cover the spatial distribution of SAPALDIA 3 cohort addresses (Additional file 1). Forty NO2 sampling sites were selected in each study area. In Basel, Geneva, Lugano and Wald, PM2.5, PM10, PNC and LDSA measurements were co-located with the NO2 samplers at 20 of the 40 sites. NO2 was measured with passive diffusion samplers (Passam AG, Männedorf, Switzerland), PM2.5 and PM10 were collected on filters using Harvard Impactors, PM2.5 absorbance was measured as reflectance on PM2.5 filters using a smoke stain reflectometer, and PNC and LDSA measurements were conducted with the Miniature Diffusion Size Classifier (MiniDiSC) (Fachhochschule Nordwestschweiz, Switzerland) [35]. All measurements were performed between January 2011 and December 2012. In each study area, regional background, urban background and traffic sites were selected. Because a substantial amount of spatial contrast is traffic-related, we chose to over-represent the number of street sites (±50 % of the total), including a wide range of different traffic intensities and street layouts. In Aarau, Davos, Montana and Payerne, all sites were measured simultaneously, three times for 14 days each in the cold, warm and intermediate seasons. In Basel, Geneva, Lugano and Wald, 20 sites (10 PM + NO2 and 10 NO2-only) were measured simultaneously for 14 days, while the remaining 20 sites were measured during the subsequent 14 days. This was again repeated 3 times in different seasons. A small number of sites for which valid measurements were only available for 1 season, were excluded from the analysis [34], explaining n < 40 for NO2 or n < 20 for PM2.5 or PM10 in some study areas (Table 2).

For each site, results from the three individual measurements were averaged to represent a bi-annual average over the years 2011 and 2012, by adjusting for the long-term concentration observed at a reference site which was centrally located in each study area. The temporal correction factor for each measurement was calculated as the ratio between the biannual mean and the average reference site measurement during the particular measurement period. Thus, it is assumed that the temporal variation observed at the fixed site monitor reflects the seasonal pattern of the entire area. This procedure has been extensively described before [34].

GIS predictor data

Sampling site coordinates were determined manually in GIS by investigators who had personally visited the sites, ensuring that the position was accurate relative to roads, and buildings. For each site, we used GIS to calculate 164 geographic predictor variables. To do this, we made use of digital datasets which were available on the European and National (Switzerland) level. A more detailed description of the predictor variables and how they were derived can be found in Table 1.

Table 1 Description of evaluated predictor variables

The following GIS source data were available:

  1. 1.

    Building density

Building footprints were available for the year 2008 from Vector25, the digital landscape model of Switzerland [36]. Data covered the whole country plus several km beyond the borders. We calculated the total area covered by building footprints in various buffers.

  1. 2.

    Population density

Aggregated census population data at a 100×100m grid were available for 2011 from the Bundesamt für Statistik (BfS) http://www.bfs.admin.ch/bfs/portal/de/index/dienstleistungen/geostat/datenbeschreibung/volks-__gebaeude-0.html for Switzerland only. For the border regions, we obtained a 100×100m CORINE (COoRdination of INformation on the Environment) population grid for 2000 for the surrounding countries (http://www.eea.europa.eu/data-and-maps/data/population-density-disaggregated-with-corine-land-cover-2000-2#tab-gisdata). The two grids were combined, adjusting for an average 4.65 % population growth in France, Germany, Italy and Austria over 2000–2011 (Additional file 2).

  1. 3.

    Land use

Digital land cover (CORINE CLC-2006 Version 13, 02–2010) data were available from the European Environment Agency for the year 2006 for all of Europe including Switzerland. We calculated the surface area of 6 land use categories (high density residential, low density residential, airport, industry, natural and port), following re-classifications previously adopted in the ESCAPE [5, 6] and APMOSPHERE [37] projects.

  1. 4.

    Digital road network

A digital road network (Vector 25) with resolution 1:25,000 was available from Swiss Topo (Bundesamt fur Landestopografie) at a national level for the year 2008, with modeled traffic intensity data for the same year. Road class (total 32 classes) and percentage heavy traffic were also available. Roads were distinguished as major if the traffic intensity was ≥ 5000 vehicles/24 h.

  1. 5.

    Altitude

A swiss elevation map with a resolution of 200 m was available from the Bundesamt für Landestopografie (Federal Office for Topography), SwissTOPO, website: http://www.swisstopo.admin.ch/internet/swisstopo/de/home/products/height/dhm25.html. Sampling points were directly overlaid with this height grid.

  1. 6.

    Dispersion model

PolluMap Gaussian dispersion model estimates were available for NO2, PM2.5 and PM10 at a 200×200m resolution, for the year 2010. These dispersion models have been shown to compare well to measurements, and details of their development have been published previously [38]. Sampling points were directly overlaid with the grids.

Few sites were within 500 m of a port or industrial area, or within 25 m of a major road, resulting in many 0-values for the smaller buffers of these land-use classes and traffic variables. Occasionally, these variables are selected as predictor variables, but this causes coefficients to be estimated with a lot of weight given to relatively few sites. This causes high Cook’s distance for the sites in question, and often unstable coefficients, which do not hold up in cross-validation, resulting in large differences between the model R2 and LOOCV R2, as previously noted for other LUR studies [5, 6]. The same happens for predictor variables which include extreme outliers. Therefore, we a priori restricted the set of eligible predictors, eliminating those which 1) had fewer than five sites with a value other than the most common value (typically 0); 2) had a maximum which was more than 3 times the P10-P90 range above P90, or 3) had a minimum which was more than 3 times the P10-P90 range under P10. The latter 2 criteria are based on the generally accepted criteria for outliers (lower limit P25-3*(P75-P25), upper limit P75 + 3*(P75-P25), but are less restrictive. The elimination process was repeated for each model, and so eligible variables varied, based on the number of sites, and on the selection of sites included in the model.

LUR model development

Linear regression models were developed using a supervised forward selection procedure, first evaluating univariate regressions. The corrected bi-annual average concentrations were evaluated against all eligible potential predictors. The predictor giving the highest adjusted explained variance (adjusted R2) was selected for inclusion in the model if the direction of effect was as defined a priori and the P-value was <0.10, following procedures used before [5, 6]. Subsequently, we evaluated if any of the remaining predictor variables further improved the model adjusted R2 by at least 0.01. Again, we selected the predictor giving the highest gain in adjusted R2, if it had a P-value <0.10 and the expected direction of effect. Additional variables were not selected if they changed the direction of effect of one of the previously included variables. This process continued until there were no more variables which fit the criteria, and improved the model adjusted R2 by at least 0.01.

As final steps, variables with a p-value above 0.10 were removed from the model. Furthermore, we checked that all Variance Inflation Factors (VIF) were lower than 3 (ensuring the absence of collinearity), and that Cook’s Distance values were below 1, ensuring the absence of highly influential observations disproportionately influencing a specific variable’s coefficient.

All analyses were done using SAS version 9.4. The Moran’s I statistic was calculated to indicate spatial autocorrelation of the model residuals (with weights proportional to the inverse distance squared). In addition, the significant dependence of model residuals on the study area was tested for the multi-area models (PROC GLM, with class variable for study area).

The modelling procedure is described in more detail in Additional file 5, Table 1. In the first phase of our modelling effort, we combined all measurement sites where the pollutant in question was measured to fit a single LUR model for each pollutant. We considered, but did not force in PolluMap model NO2 (for pollutant NO2), Pollumap PM2.5 (for pollutants PM2.5 and PM2.5 absorbance) and PM10 (for pollutants PM10 and PMcoarse). This way, we ensured continuity of exposure characterization across areas, and reduced the risk of over-fitting, which is higher for models developed based on a limited number of training sites [3941]. No dispersion model predictors were evaluated for the main PNC and LDSA models. In the second phase (model 2, Additional file 5, Table 1), we addressed those multi-area models which did not adequately model between-area variability (resulting in significant spatial autocorrelation in the residuals). We tried to force PolluMap NO2 estimates into the NO2 model, to consider PolluMap NO2 for the PM2.5 absorbance model, and to consider PM2.5 and PM10 PolluMap for the PNC and LDSA models, In the third phase, we introduced area-indicators to account for between-area variance. These indicators were introduced to the model as dummy variables for each site in the area of Geneva, Lugano or Wald (Aarau being the reference). Ultimately, in the 4th phase,, we fitted an area-specific NO2 model for each study area to optimally capture local contrasts. In addition, we fitted an alpine (study areas Davos and Montana) and a non-alpine model (study areas Aarau, Basel, Geneva, Lugano, Payerne, Wald), retaining the option of making model predictions outside of the eight study areas, for sites above and below 1000 m, respectively. This was only possible for NO2, where 40 measurement sites per study area were available. We did not consider building local models for pollutants with only few sites (≤20) available per study area. The models that were ultimately selected for the epidemiological application are shown in the main paper. The selection process and the disregarded alternative models are shown in Additional file 5.

Validation and quality assurance

Model performance was evaluated in three ways: firstly by leave-one-out cross validation (LOOCV), where each site was sequentially left out from the model while the included variables were left unchanged. Each site’s value was then predicted from the model based on the n-1 sites, after which the R2 between observed and predicted values was calculated. Secondly, and only for those models combining multiple study areas, all sites from one particular study area were left out (leave-one-area-out cross validation (LOAOCV), while the remaining sites from the other areas were used to predict their values. The variables in the model were left unchanged. The predicted concentrations from the LOAOCV were compared to the measured concentrations, showing the degree of over- or under-prediction per study area.

Thirdly, external validation was performed using a set of routine monitoring stations from the National air quality monitoring network (NABEL) and several cantonal air monitoring agencies. We gathered data from all available stations in Switzerland which had annual average concentrations available for both 2011 and 2012 for NO2 (102 stations), PM2.5 (10 stations), elemental carbon (EC) as a marker of PM2.5 absorbance (these metrics are known to correlate well from previous studies [3], 18 stations), PM10 (82 stations) and PNC (13 stations). PMcoarse was calculated as the difference between PM10 and PM2.5 for 10 stations. No routine monitoring data was available for LDSA, since this is not measured as part of any routine monitoring networks. The predictor variables selected for the models were also extracted for these stations. To avoid predictions outside of the plausible concentration range, predictor values were truncated to the minimum and maximum values present in the various training data sets which were used to develop the LUR models [42]. Predictors derived from dispersion models were not truncated: these were assumed to reflect regional background pollution levels, and were therefore allowed to vary beyond the range present in the training dataset. For PM2.5, PM10 and PMcoarse, LUR models were then applied to all available stations. The PM2.5 absorbance and PNC models, which included area indicators, were applied to all stations which fell within 20 km of any of the measurement sites used to develop the model for that specific area. This resulted in 5 successful predictions for PM2.5 absorbance and 4 for PNC. Reducing the inclusion criterion further (e.g. to 10 km) would have resulted in too few sites (≤3) to allow any comparison. For NO2, we applied the area-specific models to stations within 10 km of the sites used for the modelling, the alpine models for stations above 1000 m and the non-alpine model for stations below 1000 m. Model predictions were then compared to the two-year (2011–2012) measured average concentrations, to best represent the time period for which the models were developed.

Sensitivity analyses

To allow a more direct comparison of model performance between pollutants, we additionally fitted an NO2 model for the same four study areas which also contributed to the PM and UFP models (Basel, Geneva, Lugano and Wald). We explored whether applying the local NO2 models within 10 km of the study areas, and alpine and non-alpine models everywhere else, gave the most accurate predictions (in comparison to applying alpine and non-alpine models only, or applying local models within 20 km of the study areas).

Results

Pollutant distribution characteristics are shown in Table 2 and are available in more detail in Eeftens et al. 2015 [34]. For all pollutants, substantial variation was present between as well as within the areas. Within-area contrasts were largest for NO2 and PMcoarse, while between-area contrasts were the dominant source of variability for the PM2.5, PM2.5 absorbance, PM10, PNC and LDSA [34].

Table 2 Distribution of NO2, PM2.5, PM2.5 absorbance, PM10, PMcoarse, PNC (Particle Number Concentration) and LDSA (Lung Deposited Surface Area) concentrations over the measurement sites

The area-specific, alpine and non-alpine LUR models and LOOCV validation statistics for NO2 are presented in Table 3. Combined-area LUR models for PM2.5, PM2.5 absorbance, PMcoarse, PNC and LDSA are shown in Table 4. Pearson correlations between different pollutants are shown in Table 5 for the measured and for the LUR predicted concentrations. Model performance by area and LOAOCV statistics are shown in Additional file 3. Descriptive statistics of the predictor variables used in the models can be found in Additional file 4.

Table 3 Alpine, non-alpine and area-specific LUR models for NO2
Table 4 Multi-area LUR models for PM2.5, PM2.5 absorbance, PM10, PMcoarse, PNC and LDSA
Table 5 Pearson correlations (n) between different pollutants using measured (lower half) and predicted (upper half) concentrations

We observed substantial under- and over-prediction by study area, and significant spatial autocorrelation in the residuals for the eight-area NO2 model and the four-area PM2.5 absorbance, PNC and LDSA models (Additional file 5, Table 2). These models also showed dependence of the model residuals on study area, and substantial over- or under-prediction bias in some of the areas (Additional file 5, Table 3). We therefore ultimately fitted area-specific NO2 LUR models which could adequately capture local variability. Additionally, we fitted NO2 LUR models for alpine (Davos and Montana) and non-alpine (Aarau, Basel, Geneva, Lugano, Payerne, Wald) areas, which could be applied to predict NO2 exposures for addresses outside of the eight SAPALDIA areas, above and below 1000 m, respectively (Fig. 1). For PM2.5 absorbance, neither the PM2.5 nor the NO2 dispersion-model estimates were selected since neither explained the between area variability. For the novel markers of ultrafines (PNC and LDSA), allowing dispersion model estimates for PM10 and PM2.5 to enter the models resulted in no systematic under- of overprediction by area in the models (Additional file 5, Table 3). However, the inclusion of these dispersion-model estimates did not allow us to capture the spatial variation of ultrafines independently from the mass. To better explain between-area variability in these models, we introduced area-indicators for PM2.5 absorbance, PNC and LDSA.

Fig. 1
figure 1

The boundaries of the 10 km and 20 km buffer areas, drawn around the measurement sites which were used to develop the area-specific NO2 LUR models. Black dots represent resident locations of SAPALDIA participants

Combined area NO2 models for alpine (Davos and Montana) and non-alpine (all others) areas yielded R2’s of 0.53 and 0.65, respectively, and performed similarly in cross-validation (LOOCV R 2 = 0.46 and 0.63, respectively) (Table 3). Area-specific NO2 models yielded an average R2 of 0.71 (range 0.52–0.89), which was higher than for the alpine and non-alpine NO2 models. LOOCV R2 values were on average 10 % (range 3–15 %) lower than model R2’s, indicating that models were generally less robust than the multi-area ones. None of the study area specific, nor the alpine or non-alpine NO2 models showed significant spatial autocorrelation. While the model residuals remained associated with the study area in the non-alpine model (Table 3), absolute over- and under-prediction for the different areas was small (Additional file 3). The alternative NO2 models without regional indicators, with PolluMap NO2 estimates, and with area indicators yielded an R2’s of 0.52, 0.52 and 0.64, respectively, but could not adequately explain between area variability, or were unable to predict exposures outside of the study areas (Additional file 5).

Multi-area LUR models explained a moderate amount of spatial variance for the different PM mass fractions PM2.5 (R 2 = 0.57), PM10 (R 2 = 0.63) and PMcoarse (R 2 = 0.45). Explained variance was higher for PM2.5 absorbance (R 2 = 0.81), PNC (R 2 = 0.87) and LDSA (R 2 = 0.91) (Table 4). For comparison, a model for NO2 based on the same four areas yielded a moderate R2 of 0.61 in sensitivity analyses (Additional file 5, Table 2). LOOCV R2 values were on average 5 % points (range 4–7 %) lower than model R2’s for all multi-area models, indicating that models were robust (Table 4). Multi-area models for PM2.5, PM10 and PMcoarse generally performed well in leave-one-area-out cross-validation, predicting similar amounts of within-area spatial variation, whether the area in question was included in the training dataset or not (Additional file 3). However, for all pollutants, the under- or over-prediction of absolute concentrations became more extreme if the area in question was left out from the training dataset (Additional file 3). Alternative models for PNC and LDSA which did not consider and regional indicators were unable to explain between area variability adequately. Alternative models which considered dispersion model predictors yielded high R2’s of 0.84 for PNC and 0.89 for LDSA. These models were able to predict between areas, but did not allow us to understand PNC and LDSA variability independently from PM mass. (Additional file 5).

LUR models were successfully applied to available routine monitoring sites and predicted 52 to 83 % of spatial variability for most pollutants (Table 6). Over- and under-prediction was small for NO2, PM2.5, PM2.5 absorbance and PM10 compared to the predicted concentration ranges. Over-prediction for PMcoarse was larger than for PM2.5 and PM10, likely because of the increased uncertainty that is introduced when subtracting PM2.5 from PM10. Sensitivity analysis showed that applying local NO2 models within 10 km of the study areas, and alpine and non-alpine models farther outside the study areas, outperformed applying alpine and non-alpine models only, or applying local models within 20 km of the study areas (Additional file 6). While the PNC model performed well in LOOCV, there was no association with the routinely measured averages at fixed monitoring sites. The four available routine monitors generally measured higher numbers of particles than our models predicted, and our PNC model predictions did not capture the spatial contrast measured by the four routine monitors well.

Table 6 Results of independent external validation using air pollution data from the routine monitoring sites

Discussion

We developed multi-area LUR models for NO2, PM2.5, PM2.5 absorbance, PM10, PMcoarse, PNC and LDSA with moderate to good explained variance across regions of Switzerland. Models for PM2.5 absorbance, PNC and LDSA explained the highest amount of spatial variance, and also performed best in LOOCV. Multi-area models for PM2.5, PM10 and PMcoarse adequately captured between-area contrasts, but area-indicators were necessary to adequately capture the between area contrasts for PM2.5 absorbance, PNC and LDSA. For NO2, study area specific models were preferable to any LUR models which combined areas, since they were able to explain more local spatial variance. However, they were most reliably applied within 10 km of the measurement sites which were used to develop the model. LOAOCV validation for the particulate air pollutant models shows that the multi-area models based on three areas predicted moderate to high amounts of spatial variation in the area which was left out. Validation using independent routine measurement sites shows that the developed models were able to predict moderate to high amounts of spatial variation, except for PNC. For LDSA, we could not verify our LUR model predictions against independent measurements.

LUR models for NO2, PM2.5, PM2.5 absorbance, PM10 and PMcoarse were previously developed based on measurement campaigns for the SAPALDIA study for the years 1993 and 2003 [10] and for the ESCAPE study for the years 2008 to 2010 [5, 6]. The previous SAPALDIA study also identified that fitting a combined-area model for NO2 was challenging because between area variability could not be adequately captured. The models developed in this paper generally performed similarly to previously developed models. Variations in performance may be explained by differences in the site selection procedure, different numbers of sites or the availability of better predictor variables. For a detailed discussion of this comparison and a table summarizing the performance of LUR models from the current and from previous studies in SAPALDIA study areas, we refer to Additional file 7.

There are few LUR models for PNC which can be compared with our model. In Switzerland, only one other LUR model was previously made for PNC [19]. This model was developed for the city of Basel and comprised repeated 20-min measurements at 57–59 measurement sites during three seasons. Median concentrations were log-transformed and a large fraction (50 %) of the (temporal) variability could be explained by the concurrently measured concentration at a nearby urban background site. Subsequently, spatial predictors derived from GIS, and other predictors related to season, meteorology, time and manually observed site characteristics were able to increase the explained variance to 58-68 % (depending on which predictor sets were evaluated) [19]. We cannot directly compare this model for PNC to ours, since it contained temporal as well as spatial terms and used log-transformed median concentrations. In comparison to PNC models from other countries [1518, 20, 21], which are purely spatial, and used average, not-transformed concentration data, our models performed very well, yielding R2 values of 0.84 and 0.89, respectively. The Dutch PNC model for Amsterdam was the only other study based on weeklong observations per site [15], and had a notably higher explained variance (R 2 = 0.67, LOOCV R 2 = 0.57) than models which were based on shorter-term averages (10–60 min) [16, 18, 19, 21] or mobile monitoring [17, 20].

To understand the spatial predictors of PNC and LDSA independently of particle mass, our main models did not offer PM mass dispersion model estimates. However, in a sensitivity analysis, we allowed the use of dispersion model estimates for PM mass concentrations to enter the models, revealing a strong predictive power of the PM mass dispersion model estimate for the local levels of PNC and LDSA alike (Additional file 5). This may appear counterintuitive, given the rather different small-scale spatial distributions of freshly emitted particles – well indicated by PNC – as compared to the spatially more homogeneous PM2.5 and PM10. However, we found large spatial variability between study areas for long-term PNC and LDSA measurements [34]. The high predictive ability of dispersion model predictors in PNC and LDSA models and the high correlations in Table 5, show that between-area pollution contrasts are similar for NO2, PM mass, PNC and LDSA. The high spatial correlation between NO2, PNC and LDSA, which all characterize motor vehicle exhaust in the direct vicinity, was previously discussed in Eeftens et al. [34]. The selection strategy of choosing measurement sites close to residential locations, including side- and back-yards was aimed at capturing the residential variation of PNC and LDSA, rather than the high peaks along busy roads. Therefore, the spatial variability of PNC and LDSA also showed high correlations with other pollutants, such as PM2.5 and PM10 [34], and it could be predicted by dispersion modelled PM2.5 or PM10. (Additional file 5).

Variables related to traffic and major roads were selected in nearly all models, mostly in small buffer sizes (25 to 100 m). This has also been observed in earlier studies [57], and reflects the major impact of traffic on the adjacent roads on local air quality rather than reflecting the influence of ring roads and motorways. The dispersion model estimates from the Pollumap PM2.5 and PM10 models are found in the multi-area models for PM2.5, PM10 and PMcoarse, and account for a large fraction of the regional and between-area variability (Table 4). These models are too coarse to explain much of the local variability within study areas, and were only included in the area-specific NO2 models for Basel, Davos and Wald, but not as the first variable (Table 3). PM2.5, PM10, PMcoarse and PNC models further included large-buffer land-use variables on urban green and natural land, both serving as a sink of pollutants (negative coefficients). These land-use variables do not appear in the multi-area or local NO2 models. However, several of the local NO2 models include large to medium-size buffers of building density, population density and residential land, representing broader-scale urban activities. In addition, local NO2 models include industrial land-use (Aarau and Lugano) indicating local industrial sources. In the Basel and Lugano NO2 models, proximity to the Rhine river and Lugano lake was relevant as well. The Rhine above Basel hosts the most important port of Switzerland, and almost all the freight transport ships on the Rhine cross the city. The lake in Lugano is also used by boats, including regular boating services for tourists.

We evaluated model performance in three different ways: 1) using leave-one-out cross validation (LOOCV), which is widely used in previous LUR studies [57], 2) leave-one-area-out cross-validation (LOAOCV), as used by a previous study which combined study areas [22] and 3) by independent external validation, where the models were applied to independent sites from a routine measurement network. The second method gives an idea of how well the models predict within-area contrasts in areas which were not used to develop the model. The third method shows how well the models predict independent concentration contrasts at a national level.

Several recent studies pointed out that leave-one-out cross-validation overestimates the true model performance when applied to independent sites, especially if the models were developed based on a limited number of training sites, which increases the risk of over-fitting [3941]. Our multi-area models are based on a minimum of 67 sites, and our study area specific models on a minimum of 37 sites, increasing the robustness of the models, which is apparent from a small difference between model R2 and LOOCV R2, as well as good results from LOAOCV and independent external validation.

Independent validation showed that we could predict moderate to large amounts of spatial variability for NO2, PM2.5, PM2.5 absorbance, PM10 and PMcoarse among routine monitoring sites, which were not used for model development. The comparison of our PNC model with the routine monitoring data must be interpreted with caution. In contrast to all other pollutants, the routine monitoring data for ultrafine particles are derived from different types of monitoring instruments, which measure different particle size ranges, and use optical or electric charge measurement principles. Therefore, the absence of agreement between the LUR model predictions and the routine monitoring for PNC likely highlights the limitation of comparing any metrics of absolute concentrations of ultrafine particles that are derived from multiple different instruments. Hence, this finding should not be interpreted as indicating poor quality of the LUR model.

We minimized the risk of over-fitting by setting a priori criteria for inclusion of variables into our model. These criteria were related to the expected direction of the effect, significance, and distribution of the predictor variables, producing models which were plausible, with coefficients that were robust and not dependent on a small number of sites. Strict selection criteria resulted in the inclusion of 3 to 5 predictors for the local NO2, PM2.5, PM10 and PMcoarse models, and 6 to 7 for the PM2.5 absorbance, PNC and LDSA models, which included area indicators. Similar screening strategies were used in other studies [43, 44].

While there are no strict rules for a minimum number of sampling sites, for developing LUR models, we observed that the model R2 and LOOCV R2 were larger for the area-specific NO2 models (based on up to 40 sites) and the alpine and non-alpine models than for the model fitted on all eight areas. The risk of over-fitting is larger when using smaller training sets in model building [3941]. Therefore, we gave preference to the more robust multi-area models for the particulate pollutants, wherever systematic bias in predicting the between-area variability was not an issue. In LOAOCV, we show that the separate alpine and non-alpine NO2 models (Additional file 3) are indeed preferable to the NO2 model which combined all eight regions (Additional file 5), with lower over- and under-prediction by study area, and better explained variance within each area. In external validation, we further show that the alpine and non-alpine models predict better in the independent external validation (Table 6). The inability to properly model NO2 for the Alpine valleys with data from the flatter, much more populated areas of Switzerland and vice versa may be explained by the distinct topographies and meteorological conditions of alpine and non-alpine regions in Switzerland.

While the spatial distributions of different pollutants were already moderately to highly correlated in the SAPALDIA3 measurements, we observed that after applying the LUR models to the measurement sites on which they were fitted, the correlations between pollutants became even higher (Table 5). The LUR models all include different predictors, but these predictors are also correlated. Using these predictors to fit models “smoothes” the random variation in the measurements for all pollutants in a similar way. To not further complicate this challenge, our main models for PNC and LDSA did not allow dispersion modelled PM mass concentrations as predictors. The high correlations between predicted concentrations are a combination of the rather high spatial correlation of these pollutants (lower left part of Table 5) and the artifact of the modelling process. As a consequence, we may expect similarly high correlations when we apply our models to the residential sites of the SAPALDIA cohort participants. It is important to be aware of possible artifacts when using LUR modelled estimates in epidemiological health analyses, especially when fitting 2-pollutant models, for which the collinearity of the estimated exposures will be high. The implication is that epidemiological studies relying on LUR models are limited in disentangling the health effects attributable to the different pollutants, looking at PNC and LDSA independently from PM mass.

Because of the increasing number of LUR models, developed for different study areas, long-term cohort studies such as SAPALDIA have a multitude of exposure estimates available. To avoid multiple testing, we pre-selected models for the health analyses to prevent the use of too many models. Based on our LOAOCV (Additional file 3) and external validation results (Additional file 6), we concluded that long-term exposure to NO2 is best estimated by study-area specific models. We will use these models to predict long-term exposure to NO2 at the SAPALDIA home and work addresses which fall within 10 km of at least one of the measurement sites used to develop the model for that particular study area. For cohort participants who have moved further beyond the original SAPALDIA study areas, we will estimate the NO2 exposure based on the alpine and non-alpine models, respectively, which also performed adequately in external validation. The validity of the models presumably decreases with the distance from the sites used to develop the model. Therefore, an epidemiological sensitivity analysis may address if the effect estimates are affected if those people living further from the measurement sites are excluded. Exposure to PM2.5 absorbance, PNC and LDSA model will only be assigned to those coordinates within 20 km from the areas for which they were developed. Exposure to all particulate air pollutants (PM2.5, PM10 and PMcoarse) will be assigned to all homes and work addresses by applying the multi-area models developed for these pollutants to all subjects. Since these models are based on measurements made in Basel, Geneva, Lugano and Wald only, it will be necessary to verify in the epidemiological analyses that their application in the remaining four areas, as well as the rest of Switzerland does not affect the relationship with the health outcomes. This might be achieved by performing sensitivity analysis on only those participants who live within 10 km/20 km of the measurement sites which were used to develop the models. Appropriately, the Basel, Geneva, Lugano and Wald study areas are geographically diverse, which suggests the models can be applied in a wide geographical context. Moreover, external validation suggests that throughout Switzerland, the PM2.5, PM10 and PMcoarse models explain moderate to high amounts (R 2 = 0.83, 0.71 and 0.65, respectively) of spatial variation.

Conclusions

We were able to develop a set of LUR models capturing both the between and within-area variability in long-term pollutant concentrations in the study areas of the Swiss SAPALDIA cohort. Multi-area models for PM2.5, PM10, and PMcoarse, performed adequately in LOOCV, LOAOCV, and external validation. For PM2.5 absorbance, PNC and LDSA, area indicators were needed to capture the between area variance. Model and model performance (evaluated by LOOCV, and for PM2.5 absorbance by external validation) was high. For NO2, applying study-area specific models was preferable over applying combined-area alpine/non-alpine models. The study-area specific, alpine and non-alpine NO2 models, and multi-area models for the particulate air pollutants will be applied to derive exposure of participants of health studies such as the SAPALDIA cohort study. However, we observe high spatial correlations between the estimated pollutants, so the ability to fully disentangle their health effects may remain a challenge.