Introduction

Improving air quality is not only beneficial for human health, but it also reduces our impact on climate change (EEA 2020; Feng and Fang 2022). Therefore, actions to reduce air pollutant emissions can have multiple benefits. Air quality models, such as chemistry transport models (CTM), are valuable tools for the assessment of the impact of emission reduction strategies and forecasting of pollutant concentrations. Since models are being progressively used for policy support, such as in the frame of the Air Convention or the Ambient Air Quality Directive (EU 2008), model performance and response sensitivity assessment have become an increasingly important issue.

Although many studies tackle the impact of chemical and physical processes, initial and boundary conditions, or emissions on absolute concentrations or trends (Curci 2012; de Meij et al. 2009; Dufour et al. 2021; Huang et al. 2020; Huertas et al. 2021; Khan and Kumar 2019; Li et al. 2021; Pernigotti et al. 2012; Thunis et al. 2021b; Vuolo et al. 2009; Yan et al. 2021), dedicated exercises to evaluate the variability of model responses (concentration changes) to local emission modifications are not common. It is however a key element to ensure robust policymaking, since absolute or relative concentration changes are commonly used to estimate or evaluate the efficiency of air quality plans, particularly in the frame of integrated assessment tools (Viaene et al. 2016).

Long-term modelling exercises such as EURODELTA (Bessagnet et al. 2016; Ciarelli et al. 2019; Colette et al. 2017; Mircea et al. 2019; Thunis et al. 2010; Vivanco et al. 2017), AQMEII (Im et al. 2018; Liu et al. 2018; Solazzo et al. 2012, 2013) and CityDelta (Cuvelier et al. 2007; Thunis et al. 2007; Vautard et al. 2007) were designed to evaluate and intercompare model responses to emission changes. With the exception of Citydelta, these exercises mostly focused on continental and regional model responses.

FAIRMODE is the Forum of Air Quality Modelling in Europe aiming at bringing together air quality modellers and users in order to promote and support the harmonized use of models (Miglietta et al. 2012; Monteiro et al. 2018; Kushta et al. 2019; Pisoni et al. 2019), with emphasis on model application under the European Air Quality Directive. In the context of FAIRMODE, a dedicated intercomparison exercise has been formulated to assess the sensitivity of model responses to emission changes, with the view of assessing and understanding the main causes of discrepancies between models.

The current work refers to an intercomparison platform, rather than an intercomparison exercise, to reflect the fact that the activity is continuous, in contrast to intercomparison exercises previously cited, that took place over a defined period of time. At the current stage of this long-term programme, the goal of this paper is not to provide an in-depth analysis, but rather accentuate the diversity of model responses and identify what processes could be potential drivers of this variability. The objective of this paper is twofold: (i) present the FAIRMODE platform and models involved as a community initiative, and (ii) evaluate the amplitude of the model responses for O3 and PM concentrations, two pollutants that are formed or partially formed in the atmosphere, respectively.

Design of the intercomparison platform

Overall framework and setup

The focus of this benchmarking platform is on the urban and regional (i.e. sub-national) scales. The setup considers a range of European cities (mostly EU capitals) plus a few larger regions with high levels of pollution. The proposed geographical distribution ensures adequate coverage of Europe to take into account the diversity of atmospheric conditions, meteorological particularities and emissions. Theoretical emission changes are applied on the entire urban (i.e. the large functional urban area as defined by the OECD (OECD 2012) or regional area (e.g. the administrative region). Note that, as initially designed, the platform is well suited to modelling systems from regional to urban scales (i.e. it does not address fine scale traffic site environments or industrial hot spots). By “modelling system”, we refer here to the system composed by the air quality model itself (configured with chemical/aerosol schemes, transport and dispersion algorithms, models for natural emissions of gas and aerosol, etc.) and its associated input data (anthropogenic emissions, meteorology, initial and boundary conditions, etc.) at all relevant spatial scales. By “responses to emission changes”, we mean the concentration change (or delta) resulting from a given reduction of emissions from anthropogenic activities.

Both short-term (ST) episodes and long-term (LT) simulations are considered. Given their limited CPU demand, ST episodes allow users to perform simulations for a larger number of scenarios focusing on mechanisms and processes that favour such conditions. Assessment of ST episodes also provides information on specific timeframes with threshold exceedances or unusually high concentrations, while LT simulations do not address as they focus on long-term consistency and benefits as well as on air quality indicators. In ST cases, a large number of cities can also be considered in a single simulation as city interactions are less likely to occur over a limited episode than over longer time periods. In addition, episodes are easier to analyze and interpret than yearly averages that sometimes include “compensation”’ processes. On the other hand, episodes generally lead to weaker signals, which might hinder the analysis whereas yearly average concentrations are the most relevant output in the context of the Ambient Air Quality Directives (EU 2008).

For ST episodes, both winter (mostly for particulate matter (PM)) and summer (mostly for ozone (O3)) episodes were selected. Each episode covers a few days and has been selected based on the CAMS (Copernicus Atmospheric Monitoring Service) reports and observational data from the EEA (European Environmental Agency) air quality e-reporting (AIRBASE 2022).For LT simulations, emission reductions are applied to cities that are far away from each other to avoid reductions applied over one city or region influencing the background levels in another city/region.

As shown in Fig. 1, most cities are EU capitals and two large regions over the North of Italy (Po Valley) and the south of Poland (Malopolska) are selected to extend the analysis to important regional hot spots in Europe. The focus is on the analysis of ground level PM10, PM2.5, O3 and NO2 concentrations. Other species, such as HNO3, NH3, HCHO, H2O2, SO2, PM speciation and deposited compounds are also stored for ST episodes to support the analysis but are not a focus of the present study.

Fig. 1
figure 1

Location of the target domains where emission reductions take place

To evaluate the diversity of responses in real policy support situations, each modelling group used its own setup and input data (emissions, boundary conditions, meteorology, etc.). Moreover, constraining the sources of meteorological data in such an exercise is not fully relevant because CTM models not online coupled to meteorology often recalculate key variables such as the planetary boundary layer, the vertical eddy diffusion or the vertical wind speed to keep mass conservation, with their own parameterisations. The only constraints are (i) to simulate the same meteorological year 2015 and (ii) to apply fixed emission reductions over the same spatial area (“target domain”) as defined in Fig. 1. Table 10 in Appendix 1 gives the exact locations. Throughout the paper, “pollutant” refers to species produced or emitted in the atmosphere while “precursor” refers to emitted species which can lead to a new pollutant. For instance, O3 and PM10 are considered pollutants while NOx and VOCs are precursors. The “delta” terminology refers to the differences between a scenario and the base case concentrations.

Selected scenarios

This intercomparison considered two idealized emission scenarios: emissions are reduced by 25% and 50% for two groups of pollutants depending on whether the pollutant targeted for reduction is PM or O3. These reduction rates are in line with usual expected emission reductions able to have a substantial impact on concentrations. The use of these two emission reduction ranges allows for the investigation of the linearity of the respective emission reduction. For PM ST and LT simulations, PPM, NOx, SOx, NH3 and VOC precursor emissions are reduced, while for O3 ST simulations, only reductions of NOx and VOC precursors are considered. For ST simulations, emission reductions start at 00:00 UTC the first day and end at 23:00 UTC the last day, while for long-term simulations emission reductions are applied over the entire year. An additional scenario both for LT and ST analyses is performed by reducing all precursors simultaneously, ALL consisting of PPM, NOx, SOx, NH3 and VOC for PM simulations, and ALL referring to NOx and VOC for ST ozone simulations. This simulation is used to analyze the “additivity” of the effect of emission reductions.

The selection criteria for episodes favour periods that cover several regions and cities at the same time. In terms of air quality, 2015 experienced the highest maximum daily 8-h mean concentrations of O3 of the last 5 years in Central Europe (EEA 2015). This year was also characterized by elevated PM10 annual mean concentrations, and a series of large-scale pollution events affected European air quality throughout the year. For instance, a significant PM10 pollution event took place from 12 to 20th February, affecting most areas in Europe. As shown in Bessagnet et al. (2016), emissions from residential heating, including wood and coal combustion, dominate the PM10 pollution levels during winter or early spring episodes. The locations or areas selected per each category of simulations (LT/ST) and pollutant (PM and O3) are summarized in Table 1 (LT), Table 2 (ST/PM) and Table 3 (ST/O3) along the exact time-window of simulations for the episodes studied.

Table 1 List of selected areas for long-term simulations
Table 2 List of selected cities for simulations of short-term PM episodes
Table 3 List of selected cities and areas for simulations of short-term O3 episodes

Modelling systems

The models involved in this initiative, as well as the versions of each model, are listed in Table 4. As previously mentioned, the modelling teams used their own input data and usual model configuration for their local, national or regional applications.

Table 4 Institutes/universities and models involved

For the simulation of selected episodes, a spin-up period of several days before each episode was performed by all models. For a given model, the initial conditions (meteorology and chemistry) of short-term episodes are the same for the base case and all scenarios. The models simulate an emission reduction over the target domain defined in Table 10 of Appendix 1. However, each model can simulate concentrations over a larger domain encompassing the target domain with an appropriate resolution of at least 0.1°. They can also use a cascade of nested grids to reach the highest resolution align with specific configuration needs of the respective modelling system. Table 5 summarizes the main characteristics of the modelling systems and shows the diversity of model configurations.

Table 5 Short model description (*formerly known as ZAMG for Zentralanstalt für Meteorologie und Geodynamik)

Most models are offline, i.e. chemical species, and, particularly, aerosols do not interact with meteorology through the radiative budget. Only WRF-Chem users have activated the default online option which enables the interaction between aerosols, radiation and clouds through the coupling of chemistry with meteorology in realistic synoptic conditions. The consequence of an activated online coupling is a change of meteorology when emissions and resulting concentrations are modified. This could induce an additional impact on the concentration change according to Cholakian et al. (2023) and highlighted over Europe and Asia in recent works (Bessagnet et al. 2020; Zhou et al. 2018). This effect would be amplified by the fact 4D nudging of large-scale meteorological fields are not activated in any WRF-Chem configuration allowing the local physics to be further modified due to the radiative forcings. Moreover, in some cases, modelling teams in FAIRMODE using CHIMERE or WRF-Chem have delivered results at different spatial resolutions over Paris, Madrid and Athens introducing also grid spacing effects to our results. In Appendix 2, a complete description of the model configurations is provided and in Supplementary material A, the list of models available for a given city/region is provided.

Definition of indicators

Several indicators, specifically developed for analyzing modelled concentrations changes in response to emission changes, are selected for the analysis of the results (Thunis et al. 2015; Thunis and Clappier 2014). They include the absolute and relative potential and absolute potency. These indicators are the most suitable for an analysis of potential emissions thanks to a scaling with the reduction intensity and the quantity of reduced emissions, respectively.

The absolute potential (APL) is defined as the difference of concentrations C between a scenario and a base case normalized by the percentage \(\alpha\) of the emission reduction \(APL=\Delta C/\alpha\). The relative potential (RPL) is a normalization of the APL by the concentration Cbc of the base case: \(RPL=\Delta C/\alpha {C}_{bc}\). The absolute potency APY is the difference of concentrations C between a scenario and a base case normalized by the precursor emission reduction E that has been applied \(APY=\Delta C/\alpha E\).

Mean values and average concentrations above the 95th percentile concentration computed over simulated areas (Fig. 1) were used in the analyses. These indicators can be either negative or positive showing a decrease or an increase of concentrations, respectively.

An additional innovative indicator of variability (VAR) has been defined to obtain a measurable quantity summarizing in an objective way the huge amount of data. It is based on the normal standard deviation of indicators, not only the APL and APY, but it includes also the base case emissions and concentrations for the various model applications. A detailed description of these indicators and explanation of how they are calculated for different domains and scenarios is provided in Appendix 3.

Results and discussion

The variability of individual model performances is provided in supplementary material A and C, for reference, while here are shown the main outcomes of the model responses’ intercomparison. Model application performances have been evaluated using available background urban, periurban and rural observations (AIRBASE 2022) for the main pollutants either for LT or ST simulations. The bias, root mean square error (RMSE) and Pearson’s spatiotemporal correlation are used for the evaluation. The evaluation was performed over a common set of stations over each simulation domain.

Regarding the model responses, the results presented hereafter are a summary of the analysis of model outputs in the exercise database, for O3 and PM10 concentrations with respect to emission changes. To support this first snapshot analysis, a complete report of more than 240 figures is provided in supplementary material B in a single document listing the captions of these figures referred as Figure SXXX here. Later in the sections, we refer to few of these figures when needed. The following sections summarize the key findings. A fixed colour code is attributed to each model application. The variability of model responses is examined in terms of amplitude and sign of the concentration changes. Model responses are evaluated over the target domain only (where emission reductions are applied) because it is the common area of all modelling applications.

Models’ responses to emission changes: ozone

For LT simulations, the modelled base case O3 mean concentrations are in the range of 60 to 80 µg m−3 (Figure S9). For the ST simulations, concentrations above the 95th percentile range from 80 up to 120 µg m−3 (Figure S18), since the episodes occurred in summer when O3 concentrations are generally higher.

Focusing on the ST simulations, reducing all precursor emissions at the same time (VOC and NOx) leads, for all models, to a slight increase in O3 mean concentrations except for large regions like the Po Valley. Looking at individual reductions (NOx and VOC separately), large regions like Po Valley encompassing rural areas depict an overall decrease of O3 concentrations, on average (Fig. 2 and Figure S35). In the city of Vienna, the model responses (Figure S13) can have opposite signs depending on model configurations with WRFZAMG simulating a slight decrease and EMEP a slight increase of O3 concentrations when emission of NOx and VOC are reduced at the same time. Reducing only VOC emissions (Figure S56 and S57) lead as expected to a general reduction of O3 concentrations. WRFNKUA configurations show a slight increase of ozone concentrations which is impossible to explain directly by chemical processes. However, the WRF-Chem model is an online coupled system, integrating chemical and meteorological processes and their interactions. Thus, there is direct and indirect feedback between the pollutant concentrations and meteorological processes and vice versa. These effects, that cannot be separately quantified, are potentially enough to change the sign of the responses. The largest variability occurs for NOx emission reductions (Figure S35) highlighting the importance of the simulated chemical regime that can differ between cities and models. Over urbanized areas, most models simulate an increase of mean ozone during episodes when applying a combined NOx-VOC emission reduction of 50%. This is explained by the stronger role of NOx in VOC-limited areas where VOC/NOx ratios are the lowest. The classical isopleth diagram depicted in Carter et al. (1982), Dodge (1977) and Oke et al. (2017) shows how NOx emission reductions can provide mixed outcomes depending on the chemical regimes, NOx or VOC limited. Here, we consider average concentrations; however, as shown in Vivanco et al. (2021), the reduction in NOx emissions could lead to opposing effects, depending on the metric considered (SOMO35, AOT40, daily maximum, annual values), that can be more or less influenced by night-time and/or diurnal conditions.

Fig. 2
figure 2

Example of a graphical output of the benchmark tool showing the absolute potential for ozone comparing a reduction over the Po Valley (from the MINNI model results) of NOx (left) and VOC (right) emissions. Reducing NOx emissions can increase ozone concentrations within urban areas while VOC emission reductions reduce mean ozone concentrations throughout the domain

However, over the full year (LT simulation), the mean ozone concentrations increase while the average values exceeding the 95th percentiles decrease over the Po Valley (figures S5-S6) if we consider a reduction of both NOx and VOC emissions. These NOx/VOC responses applied in large regions are in line with recent studies in Europe (Clappier et al. 2021) and China (Mao et al. 2022) over the Yangtze River Delta Region as well as with (Du et al. 2021) in the central plain in China during the COVID-19 pandemic. As expected, a rather different picture of model responses between rural and urban areas is shown in Fig. 2 for the MINNI results, showing a reduction of mean ozone in both areas when applying VOC emission reductions, and an increase of concentrations over urbanized areas when only NOx emissions are reduced.

As shown in Fig. 3, for the ST episodes, the reduction of all precursors by 50% leads to different responses in the different locations with varying magnitudes of responses depending on the modelling system. For all models, the lowest absolute potential is displayed for the Warsaw case when reducing all precursors by 50%. The other indicators, relative potential and absolute potency, respond in the same way (Fig. 3). The CHIMERE model applied in Paris and Madrid with different resolutions (light blue, purple and dark blue bars) does not display a large variability in terms of model responses for the relative and absolute potentials for O3. Over Athens however, the grid spacing of the model configuration seems to affect the model response to the specific emission reduction case. The median value of the relative potential and absolute potency of NOx and VOC emission reductions is reported in Table 6 showing the opposing effects on O3 concentrations with higher impacts on long-term simulations (LT). The order of magnitude in absolute value of the potency is larger for a NOx emission reduction compared with a VOC emission reduction.

Fig. 3
figure 3

Absolute potential \({APL}_{50\%}\) for the mean O3 values for the short-term episodes with a reduction of 50% for all precursors. At the top of each plot, a small coloured mark is drawn to show that a result is shown for a given model, to avoid any confusion for low absolute values that cannot be distinguished from the x-axis. Where the value overshoots the scale, the value is written on the plot

Table 6 Median potency (APY) in µg m−3 per ton of emissions reduced and relative potential (RPl) in % for the 50% emission reduction for ozone concentrations

Models’ responses to emission changes: PM10

Applying precursor emission reductions generally leads to a decrease of PM10 concentrations with an absolute potential reduction of − 1 to − 11 µg m−3 if a reduction is applied to ALL precursors together (Fig. 4) during the episodes. Again, the model responses can be very different. For instance, EMEPE and EMEPG have values of − 5 and − 11 µg m−3, respectively, with the same modelling setup but with different emissions, the corresponding relative potentials being approximatively − 33% and − 50% respectively. Looking at individual precursor reductions, NOx is the emission reduction that displays very different effects between models and cities/regions. For a NOx emission reduction in AMS012, an increase of concentration is observed for all models while for MAD021, CHIMCIE01M and EMEPG there are contrasting responses. PAR014 and VIE019 have the highest NOx potencies (Figure S118). For the VOC emission reduction, only WRFZAMG in VIE019 estimates a slight increase of PM10 concentrations (Figure S186). These counter-intuitive effects can be explained by non-linear processes in the chemistry schemes as explained in Clappier et al. (2021) and Thunis et al. (2021a). The highest response to a VOC emission reduction is observed for PAR014 but it remains below − 1% in terms of relative potential. For all case studies, the main impact driver is the reduction of PPM with a relative potential in the range − 6 to − 60% depending on the model and city.

Fig. 4
figure 4

Absolute potential \({APL}_{50\%}^{ave}\) for the mean PM10 values for the short-term episodes with a reduction of 50% for all precursors. At the top of each plot, a small coloured mark is drawn to show that a result is provided in the chart for the given model to avoid any confusion for low absolute values that cannot be distinguished from the x-axis. Where the value overshoots the scale, the value is written on the plot

Using the CHIMERE model in the CHIMLMD configuration, between a resolution of 9km and 3 km for PM10, a difference in the absolute potential for all emission reduction together is observed for the PM10 episode over Paris from − 7 to − 6 µg m−3 (Fig. 4). If we focus on PPM emission reductions, the impact drops between − 3.5 and − 3 µg m−3 (Figure S142) showing that the resolution affects species involved in non-linear and linear processes. It is noteworthy that CHIMERE in the different setup of CHIMECIE does not display any differences in emission reductions for the various horizontal resolutions. The impact of the resolution requires a dedicated study to understand this finding. However, it should be borne in mind that the resolution has an impact not only on the resolution of chemical and physical schemes but also on the meteorological fields. Thus, averaging, interpolating or summing meteorological variables over a nested domain onto the corresponding mother domain will not be equal to the value directly calculated in the mother domain, affecting as such further downstream concentration-related results.

For the three EMEP configurations for ST simulations (Fig. 4) at first sight, the emission inventory seems to have a significant impact on the absolute potential particularly over Lisbon, Prague and Warsaw. This will be analyzed in detail in a follow-up study. Due to the normalization, the relative potential is less impacted by the use of different emissions for LT simulations (Figure S75).

The median relative potential and absolute potency of precursor emission reductions are reported in Table 7. Clearly the effect of VOC emission reductions is the weakest among all precursors, while PPM emission reductions are the most efficient to reduce PM in urban areas. A noteworthy factor of 20 is observed between the absolute potency of NH3 and NOx emission reductions partly explained by (i) the ratio of the molar masses of these species (ratio of 2.7) which react (on a molar basis) to produce ammonium nitrate, and (ii) the fact that nitrate and nitric acid is in excess over urban areas. Over large domains like Malopolska and Po Valley, the factor is considerably reduced to 10 and even 1 (or less over Po Valley), possibly due to the effect of rural zones included in the domain and higher ammonia emissions, including surroundings under NH3-rich regimes. Over a very urbanized area like Paris, this factor reaches 50 emphasizing a general NH3 limited regime (NOx-rich regime) over this domain (Petetin et al. 2016). In terms of emission reduction efficiency, it is shown that reducing ammonia over urbanized area is much more efficient to reduce PM than a reduction of NOx.

Table 7 Median absolute potency (APY) in µg m−3 per ton of emissions reduced and relative potential (RPl) in % for the 50% emission reduction for PM10 concentrations

Indicator of variability

To analyze the variability of responses, an additional indicator—called IND—defined as the root square of the normalized standard deviation of the delta-based indicators previously is computed for each model configuration:

$${VAR}_{IND}=\frac{\sqrt{\frac{1}{\Lambda }{\sum }_{\lambda =1}^{\Lambda }{\left({IND}_{\lambda }-\frac{1}{\Lambda }{\sum }_{\lambda =1}^{\Lambda }{IND}_{\lambda }\right)}^{2}}}{\frac{1}{\Lambda }{\sum }_{\lambda =1}^{\Lambda }\left|{IND}_{\lambda }\right|}$$
(1)

where IND is the indicator calculated by Eqs. (9) to (12) of Appendix 3 for each model, where \(\Lambda\) is the number of models providing results. The variability is computed only when at least 3 models are available. We have decided to normalize the variability by the average of the absolute values of indicators. As the normalization of the standard deviation is quite tricky, a possibility would consist of using the range but this method is mostly impacted by outliers (Dodge 2006). Moreover, since indicators can be negative numbers, the use of averaged absolute values avoid the possibility of having mean values close to 0, which would strongly affect the normalization.

The variability of all indicators is synthetized in Fig. 5. The indicators are presented in Appendix 3; the variability of these indicators represent the median of all variabilities computed for a group of cities where at least three models delivered their results.

Fig. 5
figure 5

Variability of model responses calculated as in Eq. (1) for indicators related to the emissions, base case concentrations, absolute potential, relative potential and the absolute potency for the various reductions of precursors for PM10 and O3 mean and the highest 95th percentiles values (top and bottom panels respectively). At the top of each chart, the list of regions and episodes is provided. The variability is also provided for the average concentrations and emissions over the target domain of emission reductions. The variability is an average value computed for the group of cities (episodes) mentioned at the top of each chart, computed only when the results of at least 3 models were available

The indicator of variability is generally higher for responses to emission reductions than for the variability of base case concentrations. It is particularly noteworthy for the ozone episodes (only 6% of variability in base case concentrations) because of the large influence of long-range transport, leaving little impact for the local reductions, as reported in Boleti et al. (2019) and Bossioli et al. (2007) and verified during the COVID-19 pandemic (Cuesta et al. 2022; Menut et al. 2020). However, even if local emission reductions have little effect, they can be very different from model to model leading to a high variability of small values. This finding contradicts the findings of the Citydelta project (Thunis et al. 2007) which reported consistent deltas between models but on a limited number of cities. Even so, in Thunis et al. (2007) and Arunachalam et al. (2006) model, responses were sometimes large for some cities with deltas varying from 1 to 3 ppb for ozone which is a large range but on small absolute values. Also, in the EURODELTA exercise, differences between model responses using the potency indicator (Thunis et al. 2010) were large both for ozone and PM10 but at a coarse resolution of 50km over Europe.

As shown in Fig. 5, all indicators, the absolute potential, relative potential and absolute potency, show a higher variability for the response (between 20 and 100%) than for emissions and absolute concentrations (between 20 and 50%), particularly in the case of NOx and VOC emission reductions, for both ozone and PM10 concentrations.

For PM10 episodes, the variability of concentrations exceeds 20% and is often much higher than 100% for the other indicators. In general, the variabilities of the relative potential and absolute potency are lower than for the absolute potential because the respective normalization by concentrations and emissions, respectively, reduce the differences between models. The remaining variability is probably driven by the use of different setups.

The variability for the mean of the highest values (average values above the 95th percentile) is generally higher than for mean values for all PM10 concentrations. For ozone, the picture is different since highest values usually occurred in the suburbs or rural places, which are not co-located with ozone precursor emissions.

Regarding the base case, while the variability of PM10 concentrations and the precursor emissions has the same order of magnitude (around 20%), for ozone the variability of concentrations (6%) is much lower than the variability of precursor emissions (larger than 50%).

Clearly, NOx and VOC emission reductions induce the highest variability in the indicators for the PM10 episodes even if the variability in emissions is, often, the lowest. In general, the reduction of all precursor emissions together gives the lowest variability compared with individual precursor emissions, which is probably due to the compensation of effects. The city-by-city variability does not show a systematic pattern, although, for example, the city of Stockholm shows a very high variability for LT simulations both for ozone and PM10 mean concentrations (Figure S213 and S217).

Assessment of linearity and additivity

The 25% and 50% emission reductions are used to calculate a ratio (%) of deviation to linearity (or simply Linearity) defined as:

$$Linearity =100\times \left(\frac{{APL}_{50\%,m}}{{APL}_{25\%,m}}-1\right)$$
(2)

for each precursor emission precursor (denoted by m). Again, a perfect linearity is obtained for an indicator value of 0%. For the linearity, four cases exist with, in some cases, a change of chemical regime that can induce a change of sign of the absolute potential as shown in Table 8. The linearity is defined only for \({APL}_{25\%,m}\ne 0\).

Table 8 The four cases of linearity

The available scenarios also allow the analysis of the additivity property, by comparing the sum of emission reductions (50%) applied separately called “ADD” (\(\sum_{m}{APL}_{50\%,m}\)) with the combined reduction of precursor emissions called “ALL”. Here, we test this property on the absolute potential. To do so, the following criteria called “deviation to additivity” in % (or simply Additivity) is defined as:

$$Additivity=100\times \left(\frac{{APL}_{50\%,ALL}}{\sum_{m}{APL}_{50\%,m}}-1\right)$$
(3)

If the model is perfectly additive, this indicator value is 0%. The additivity coefficient is defined only for \(\sum_{m}{APL}_{50\%,m}\ne 0\) and the different cases are identified in Table 9.

Table 9 The four cases of additivity (ALL and ADD are respectively reductions with all precursors together, and the sum of individual precursor emission reductions)

Note that the linearity and additivity indicators are very sensitive to the value of the denominator and can overshoot in some cases for very low values of absolute potential.

The results for the deviation to linearity when reducing all precursor emissions are shown in Fig. 6 for ozone and in Fig. 7 for PM10. While for PM10 there is a quasi-linearity (deviation to linearity between 0 and 5%) mainly driven by the perfect linearity when applying an emission reduction of PPM, the picture is totally different for the response regarding ozone concentrations (Fig. 6 and Fig. 7). Except in London, Madrid and Paris, and to a lesser extent over Prague and the Po Valley, in cities like Vienna and Warsaw, the models show a negative deviation to linearity often larger than − 50%. In Warsaw, a value of − 143% is computed for EMEPE meaning that the sign of the response of O3 changes with the strength of the emission reduction. This is due to the change of regime when applying a stronger emission reduction, switching from a VOC- to a NOx-limited regime. Interestingly, applying separate emission reductions to NH3 or NOx emissions show that from a 25 to a 50% reduction, we obtain an increase of the potential reduction of PM10 mean concentrations for long-term simulations (Figures S89 and S111). This clearly shows the change of chemical regime upon precursor availability and aerosol thermodynamics. Indeed, the species primarily impacted by the emission reduction reaches on average a tipping point becoming limiting and then enhancing the efficiency of the reductions. The importance of ammonia emission reductions to limit air pollution have already been highlighted in Europe (Bessagnet et al. 2014a, b) particularly when applied over large domains. NH3 is considered in excess in most of Europe. However, for the values exceeding the 95th percentile in Madrid (MAD004), the picture is rather different, where a 50% emission reduction of NOx looks less efficient to reduce peak PM10 concentrations (Figure S112).

Fig. 6
figure 6

Deviation to linearity (%) for ozone for the ST simulations applying ALL (NOx, VOC) emission reductions

Fig. 7
figure 7

Deviation to linearity (%) for PM10 for the ST simulations applying ALL (NOx, VOC, NH3, SOx, PPM) emission reductions

The case of Paris for an emission reduction of SOx is noteworthy (Figure S166). while for CHIMERE in configuration CHIMCIE a quasi-linearity is observed independent of the horizontal resolution. CHIMERE in the CHIMLMD configuration has a reduction of efficiency for PM10 concentrations that decreases from emission reductions of 25 to 50% and a strong dependence on horizontal resolution. However, for the CHIMLMD configuration, the absolute impact on concentrations is very low. Regarding the impact of reducing NOx, a similar behaviour between CHIMCIE and CHIMLMD is observed with an amplified potential reduction of PM10 concentrations and higher absolute impacts on PM10 (up to − 0.7 µg m−3). At this stage of the analysis, it is not possible to explain the reason for these behaviours and identify which part of the configuration plays the most important role in explaining the differences: the model version/type, the emission dataset or the non-linear processes involved in the secondary PM chemistry. This behaviour might be a result of the calculation of statistical ratios on very low absolute values, leading to a change of sign, or it can derive from formation subprocesses, affected by the concentrations of gaseous precursors, such as the formation of ammonium nitrate and sulphate inorganic species. For example, the neutralization of ammonia by nitric acid (formed by NOx oxidation) competes with the formation of the more thermodynamically stable ammonium sulphate, in which ammonia gas neutralizes the sulfuric acid aerosols in the atmosphere (Kushta et al. 2021). Any perturbation might accelerate and/or reduce the efficiency of the formation depending on the reduction combination (sole species or combination).

Regarding the additivity of separate reduction of 50%, the results also differ between O3 and PM10 for ST simulations. For PM10, the contributions are rather additive with a deviation below 15% (Fig. 8), and often positive, showing a benefit when various pollutant emissions are reduced at the same time. It is noticeable that the WRFZAMG configuration increases the concentration reductions by 50% when applying all emission reductions together in Vienna. This behaviour requires further analysis since WRF-Chem is used with feedbacks on the meteorology that in turn affects the concentrations. For ozone, additivity is clearly not the rule as shown in Fig. 9 for short-term episode and also in the Po Valley for the long-term simulation and especially for the highest concentrations in the LT simulations (Figure S2). However, for some cities like London, Madrid and Paris, most of the configurations show a rather additive behaviour. When analyzing the difference in terms of impact between the combined NOx and VOC reductions and the sum of individual reduction impacts (Figures S2, S4), the lowest deviations were found for the Brussels and Madrid cases, with a full additivity in the EMEPC2, EMEPCE and RIOCHIRC setups. For the Warsaw case (WAR040) as the absolute potential is very low (Fig. 3) in absolute values, the deviation to linearity is very important and clearly shows the limitation of studying such indicators with low absolute values.

Fig. 8
figure 8

Deviation to additivity (%) for PM10 for the ST simulations applying ALL (NOx, VOC, NH3, SOx, PPM) versus adding all contributions (ALL = NOx + VOC + NH3 + SOx + PPM) emission reductions (− 50%) for the absolute potential

Fig. 9
figure 9

Deviation to additivity (%) for O3 for the ST simulations applying ALL (NOx, VOC) versus adding all separate contributions (ADD = NOx + VOC) emission reductions (− 50%) for the absolute potential

Conclusions and perspectives

This study presents a comprehensive application of the FAIRMODE platform (https://fairmode.jrc.ec.europa.eu) for evaluating the variability of the responses of different air quality models’ applications to prescribed emission reductions over various European cities and two highly polluted areas (Po Valley and Malopolska). Based on standard deviation calculation, using model outputs, we have analyzed the variability of several indicators such as APL and APY. The main results can be summarized as follows:

  • Air quality models’ applications show significant differences (variability, as defined in this study, often exceeds 20%) in the concentration changes (deltas) for the 25 and 50% emission reductions d;

  • The variability of model responses using delta-based indicators is higher for PM10 than for O3;

  • The variability of model responses to emission reductions is higher than the variability of modelled base case concentrations and of emissions used as input;

  • Relative indicators like the relative potential (normalized by the concentration) and the absolute potency (normalized by the emission reductions) have a lower variability compared with the absolute potential (which is proportional to the delta of concentrations);

  • For O3, the analysis of linearity and additivity of model responses show a clear impact of non-linear chemistry processes that leads to a large deviation to linearity and additivity of concentrations in relation to emission reductions;

  • For PM, the response is, in general, more linear and additive, particularly, as expected, when reducing the primary emissions of particles which weakly perturb the chemical and physical processes involved in the PM formation;

  • One should be cautious in the interpretation of these indicators because they are built on averages and ratios of values that can be very low and with different signs. More work should be devoted to develop new ones.

This type of exercise may give indications regarding the limits of the efficiency of mitigation measures. Modelling results show that applying emission reductions on several sectors (often related to a main precursor) at the same time seems more beneficial for reducing PM concentrations than reductions of individual precursors.

The lower variability between models due to a normalization by concentrations or emissions reduces the influences of different input data and permits the evaluation of the role of other processes that can explain the variability of concentration deltas. As future work, several additional analyses are planned to disentangle the role of individual processes, differences in setup and input data that give rise to the variability between model responses. The role of emissions, chemistry schemes, meteorology, online/offline coupling strategies and horizontal resolution will be of particular interest to modellers to improve the application of their models for assessing mitigation strategies aimed at improving air quality in cities and regions.

This platform and its application is an ongoing programme of work to assess the behaviour of models when applying emission reduction scenarios and test the robustness of their application to evaluate mitigation strategies to curb air pollution.