Source apportionment of air pollution in urban areas: a review of the most suitable source-oriented models

Notwithstanding the improvements already achieved in recent decades through regional and urban scale actions implemented across Europe, air pollution is still a major environment and health concern for Europeans. The quantitative assessment of the different sources of air pollution in regional/urban areas is crucial to support the design of accurate air quality plans. Source apportionment techniques are capable to relate air pollutant concentrations to existing emission sources activities and regions. The selection of the appropriate source apportionment technique to apply to a given area should take into account the ultimate goal of the study. Despite the growing number of studies that include source apportionment techniques, there is still a lack of works that summarise information on this topic in a systematic way. In this work, a literature review of studies applying SA techniques, published between 2010 and 2021, was performed. Additionally, this review summarizes the differences among the different source apportionment techniques, with focus on source-oriented models, highlighting their purpose and their advantages and disadvantages. Results shows that the number of studies using source apportionment source-oriented models has been increasing across the years, with 59% using tagged species methods, 28% brute force methods, and 13% other methods. This source-oriented models have been mostly applied for PM2.5, to assess the causes of air pollution levels.


Introduction
Despite the already achieved improvements over recent decades, air pollution is still a major environment and health concern for Europeans. Across Europe, the levels of air pollutants are still exceeding the European Union (EU) standards prescribed by the Air Quality Directive (EEA 2020). Due to the geographical concentration of people and economic activities, which result in higher emissions from different sources, air pollution in urban areas is often higher than in other areas of a country (OECD 2020). The most serious pollutants in European urban areas, in terms of harm to human health, are particulate matter (PM) and nitrogen dioxide (NO 2 ). In the EU, 97% of the urban population is exposed to levels of fine PM above the latest guideline levels set by the WHO, published in September 2021 (WHO 2021). Population living in bigger cities tend also to be exposed to higher concentrations of NO 2 due to the road traffic emissions (EEA 2021). Negative impacts on respiratory and cardiovascular health and premature deaths are two of the major effects of human exposure to air pollution (WHO 2013;Kelly and Fussell 2015). In 2018, estimates of the health impacts attributable to long-term exposure to air pollution indicate that PM and NO 2 concentrations were responsible for about 417 and 55 thousand premature deaths, respectively, in the EU-28 (EEA 2020).
Although regional and urban scale actions to reduce air pollution have been implemented across Europe (e.g. Giannouli et al. 2011;Miranda et al. 2015;Borrego et al. 2016), there are still problems that need to be addressed (Thunis et al. 2019). Air pollution hotspots remain in the Po-valley region and Eastern Europe for PM and in most European big cities for NO 2 (EEA 2019). One of the key issues is to understand the origin of the pollution (Thunis et al. 2019). For that, the quantitative assessment of the different origins of air pollution in urban areas is crucial to support the design of accurate air quality plans. As indicated in the European Air Quality Directive (EC 2008), this assessment can be made through the application of source apportionment (SA) techniques. In that sense, The Forum for Air quality Modelling (FAIRMODE), a joint response initiative of the European Environment Agency (EEA) and the European Commission Joint Research Centre (JRC), developed a European guide (Mircea et al. 2020) to provide an overview and recommendations for the application of air quality models in estimating source contributions to PM and guides the choice of the most effective mitigation strategies and measures to include in air quality plans.
SA techniques are capable to relate air pollutant concentrations to existing emission source activities (e.g. domestic heating, road transport, industries) and regions (e.g. local, urban, metropolitan areas) and may be based on the measured concentrations of pollutants, known as receptor models, or on chemistry, transport, and dispersion models, known as source-oriented models (Mircea et al. 2020). The selection of the SA technique to be used to inform about the influence that one or more sources have in a specific area and period of time, depends on the purpose of the study. According to Belis et al. (2020), the most reported purposes for the SA applications are: (i) to assess the causes of air pollution levels, (ii) to support the design of air quality plans, (iii) to evaluate the impact of abatement measures, (iv) to quantify the contribution of different areas within a country/region, and (v) to quantify transboundary transport.
Despite the growing number of studies that include SA techniques and their relevant contributions given to this research field (e.g. Belis et al. 2014Belis et al. , 2020Hopke 2016;Thunis et al. 2019;Mircea et al. 2020), there is still a lack of works that summarise information on this topic in a systematic way. To overcome this gap, in this work a literature review of studies, based on SA techniques, was performed. Furthermore, this review summarizes the differences among the different SA techniques, highlights their purpose and their advantages and disadvantages. Given the growing concern about air pollution in urban areas, this study also tries to understand which SA approaches are the most appropriate to assess air quality and to support the design of air quality plans in these areas. Then, the final goal was to highlight the research needs for this field of study. The multi-analysis feature of this work, combining a literature review with a content analysis, and focusing on the purpose of air quality management in urban areas, can enrich the existing knowledge, making it innovative and relevant in the context of SA-based studies.
This review is organised as follows. "Materials and methods" section presents the methodology used. In "Source apportionment models and techniques" section, models and techniques used in SA studies, as well as the main advantages and disadvantages of each one, are reviewed. "Source apportionment applications" section, compiles the main applications of the sourceoriented models/techniques. Finally, "Conclusions and recommendations" section summarizes the major findings and discusses the challenges for future SA applications.

Materials and methods
A literature review was performed to gather all the relevant literature to fulfil the objective of this study. This review was based on peer-reviewed papers published in international scientific journals, and was limited to articles in English, published between 2010 and 2021. Following the objective of this study, the search included the following keywords: (i) "air pollution" or "air quality" or "atmospheric pollution"; and (ii) "source contribution*" or "source apportionment" or "source oriented"; and (iii) "modelling" or "model*"; and (iv) "urban area*" or "city" or "cities". The search was performed in order include singular/plural and related words, in the categories "title, abstract and keywords" in the Scopus database (www. scopus. pt, accessed in October 2021). Figure 1 shows that a total of 557 papers were found and their abstracts were carefully read. For the 557 studies found, a simple analysis was carried out in "Receptor and sourceoriented models: advantages and limitations" section, which resulted in the exclusion of 438 papers for being considered out of the scope of this study. The remaining 119 papers were fully read, and a more detailed content analysis was performed ("Source-oriented models: brute force and tagged species methods" and "Source apportionment applications" sections). A full list of the 119 papers, including the main characteristics of each study, is presented in Table S1 of the Supplementary Material.
To keep a coherent analysis, a data sheet was developed with the author, publishing year and content information, such as the numerical model, SA technique, case Fig. 1 Review process study location, scale of analysis, and pollutants analysed. This information was used to perform a detailed analysis, where descriptive statistics were derived for the year of publication, the methodology used, and the pollutants analysed.

Source apportionment models and techniques
Receptor and source-oriented models: advantages and limitations According to the SA models and techniques used, the 557 studies analysed can be divided in two major groups: (i) SA studies based on receptor models and (ii) SA studies based on source-oriented models. Figure 2 summarizes the number of SA studies, according to the model used, during different periods between 2010 and 2021.
According to Fig. 2, during the last 12 years, there was an increase in the number of studies using SA techniques. Overall, receptor models have been the most popular method for SA studies since 2010. On the other hand, a few sourceoriented model studies have been carried out, accounting for only 21% of all studies. Each method has its own features.
Receptor models, based on the mass conservation principles, are used to perform SA by analysing the chemical and physical parameters measured at one or more specific sites (receptors) . They include many tools ranging from simple techniques, with elementary mathematical and basic physical assumptions, to complex models requiring data processing (Mircea et al. 2020). Principal component analysis, chemical mass balance and positive matrix factorization are the most applied receptor model techniques (Belis et al. 2014;Hopke 2016). Alternatively, source-oriented models are usually based on the application of air quality models, being Eulerian, Gaussian and Lagrangian models the most commonly used (Fragkou et al. 2012). In a simple or more complex way, they try to mimic the physical and chemical processes taking place in the atmosphere in the presence of emissions of pollutants. Common source-oriented model approaches are (i) brute force method, in which separate model runs are performed, each one considering a different set of sources of interest; and (ii) tagged species method that earmarks the mass of chemical species to track the atmospheric fate of every source throughout a unique model run (Mircea et al. 2020;Belis et al. 2020).
Although SA techniques have advanced considerably in the last years due to the growing interest in the scientific community, both receptor and source-oriented models still have some limitations inherent to their formulation and data availability (Mircea et al. 2020). The advantages and limitations of each SA technique most identified in the analysed studies are compiled in Table 1.
According to Table 1, one of the main advantages of receptor models is related to the reduced input data set that this SA technique requires, as well as the computing resources and data storage being almost negligible. As limitation, since receptor models derive information on sources from measured data, it is limited to sites, and time periods, for which these data are available. On the other side, the  Table 1 Advantages and limitations of receptor and source-oriented models

Receptor models
Source-oriented models Advantages It derives information about sources from measured data It estimates the contribution of sources for most of the PM chemical components It does not require an extensive input data set (e.g. 3D meteorological data, 3D emission data, air concentrations at boundaries) It does not require significant computing resources and data storage is negligible The uncertainty of the output is estimated It evaluates the contributions of sources in the absence of measured data It is possible to predict air quality changes in relation to emissions changes The definition of the sources depends on the emission inventories, so it can be detailed in terms of activity sectors It quantifies the contribution of transported pollutants It is possible to explore the variability in time (with high temporal resolution) and space of source contributions Limitations It is limited to sites where monitoring data are available It provides information for specific time windows Some methods require prior knowledge of the composition of the emission sources It is limited by the quality of the input data (e.g. emissions, meteorology) It is limited by the formulation of the chemical transport model used It requires significant computing resources and data storage 1 3 application of source-oriented models does not require the use of measured data, making it possible to apply for any location and time period. Another advantage of source-oriented models is the possibility to predict air quality impacts from emission changes, as well as the quantification of the contribution from different activity sectors or the transport of pollutants. However, source-oriented models are limited by the quality of the input data and the formulation of the chemical transport models used. Also, the significant amount of computing resources and data storage can be another limitation of this SA technique.
Based on the findings of Table 1, and considering the objective of this study to understand which SA approaches are the most appropriate to assess air quality in urban areas and to support the design of air quality plans, the following sections will focus on the analysis of papers with sourceoriented models, 119 out of a total of 557 papers.

Source-oriented models: brute force and tagged species methods
The 119 studies with SA source-oriented models, identified in the previous section, can be divided in 3 major groups, according to the type of method used, namely, (i) brute force method; (ii) tagged species method; and (iii) other methods such as path integral method, response surface modelling, among others.
The underlying question related to SA brute force method is "What would be the reduction in the pollutant concentrations corresponding to a given reduction in the emissions of its precursors?", as stated by Belis et al. (2020). Brute force method consists of running a model simulation with all emissions (baseline) and then performing several additional model simulations, each one varying emissions from a defined activity source and/or the geographical location. The difference between the results from the baseline and each additional simulation is considered as the contribution of that source (Mircea et al. 2020). This method, also known as sensitivity analysis, can be used with any numerical model, without the need of a specific model module to execute the brute force method approach .
For the SA tagged species method, Belis et al. (2020) defined the underlying question as "What is the actual mass transferred from a pollutant source to its concentration in a given location and period?". This methodology is designed for SA purposes, using labels in each precursor in every time step according to its activity source and/or the geographical origin, making it possible to quantify the mass contributed by every source/area to the pollutant concentration (Mircea et al. 2020). Tagged species method is based on the mass balance equation, ensuring the sum of the concentrations corresponding to each source is always equal to the total concentration due to all sources (Yarwood et al. 2007).
As in the previous section analysis, the number of studies using SA source-oriented models has been increasing across the years. Tagged species method has been the most used in the 119 SA studies analysed, accounting for 59% of all studies, while studies using brute force methods correspond to only 28%. The remaining 13% of the studies use other methods. The use of tagging and brute force methods has been constantly increasing over time, while the application of other approaches is more recent and varies over the years, according to Fig. 3.
The advantages and limitations of brute force and tagged species methods most identified in the analysed studies are compiled in Table 2.
One of the advantages of using brute force methods is the evaluation of the impact of abatement measures, and the possibility to use this SA technique with any numerical model. As limitation, brute force method requires the base case run (with all sources) plus as many runs as the sources to apportion. Also, due to the non-linear behaviours, the sum of the source concentrations allocated in each single run can differ from the concentrations of the pollutants in the base case run. On the other hand, tagged species method can apportion all sources in one single run. Another advantage of tagged species method is that it can be used to attribute the actual impacts of sources on health and vegetation. However, this SA technique could require additional coding efforts and is dependent on the sectorial detail available in the emissions inventory.
In summary, tagged species method quantifies the mass that is transferred from the source to the receptor. For that reason, according to Belis et al. (2020), tagged species methods can be grouped under the category of mass-transfer SA. In opposition, the brute force method is a sensitive analysis that estimates the changes in concentrations that would result from a change in emissions ).

Main purposes
The 119 papers analysed in this section reported several purposes for the applications of SA techniques. Table 3 summarizes the main purpose of the SA studies, divided by the source-oriented models used, i.e. tagged species method, brute force method or other, published between 2010 and 2021. Table 3 shows that most of the papers applied SA source-oriented models to assess the causes of air pollution levels (42; ~ 35%), of which 24 used tagged species methods (e.g. Huang et al. 2012;Wang et al. 2014aWang et al. , 2019Yang et al. 2020), 13 used brute force methods (Wang et al. 2014b(Wang et al. , 2015Zhang et al. 2014b;Lu et al. 2019b), and 5 used others methods (e.g. Lee et al. 2014;Qiao et al. 2018). Following, 18 (~ 15%) papers aimed to quantify the contribution of different areas within a country/region, with 11 of these papers using tagged species methods (e.g. Valverde et al. 2016;Zhang et al. 2017), 5 using brute force methods (e.g. Huang et al. 2018;Wang et al. 2020b), and 2 using other methods (e.g. Zhu et al. 2018). The quantification of transboundary transport was the purpose of 10 papers, but only using tagged species methods (8 papers  Requires as many runs as the sources to apportion plus the run with the base case (all sources) Due to the non-linear behaviour, the sum of source contributions may not match the total pollutant mass obtained in the base case (mass is not always conserved) Dependent on the sectorial detail available in the emissions inventory Dependent on the sectorial detail available in the emissions inventory Could require additional coding efforts For non-linear pollutants, the source contribution cannot be extrapolated to situations different than the modelled case the vertical circulation of atmospheric pollutants using a brute force method, among others.

Pollutants
The pollutants analysed in the SA source-oriented models' studies reviewed in this work are summarized in Table 4. Some studies analysed multiple pollutants, so the total number of the pollutants may be greater than the total number of studies. SA source-oriented models have been applied for many air pollutants such as PM10 (e.g. Cheng et al. 2013 (2021) that, using the brute force method, quantified the contribution of emissions from major source sectors and source regions of Indo-Gangetic Plain to local and regional PM2.5 pollution during winter; and Qiao et al. (2021) that quantified the contributions from different sectors and regions to PM2.5 in the Sichuan Basin, using the tagged species method.

Case studies locations
The location of case studies included in the papers was also evaluated (Fig. 4). In order to simplify this analysis, the case study locations, which range from one to several cities in more than one country, have been grouped by country. Some papers used multiple case studies, located in different countries, so the total number of case studies may be greater than the total number of studies. Figure 4 shows that most of the case studies (65; ~ 42%) are located in China. Following, 22 (~ 14%) case studies are located in the USA, 8 (~ 5%) in Italy and 6 (~ 4%) in Portugal. When analysing the location of each case study by SA source-oriented model, no relationship was found between the use of a specific method and the location of the case study.
According to WHO (2016), China, here identified as the most used case study location, is considered the region of the world with more air pollution-related premature deaths. China is also located in the world region where the highest levels of air pollution are recorded. Despite these relationships, there is no clear relationship between the case studies locations of the analysed papers and the places with poorer air quality and, consequently, more associated premature deaths. 1 3 Case studies' periods Figure 5 presents the summary of case studies' periods, used in the SA source-oriented model studies analysed.
To simplify this analysis, the periods were clustered in three different types, namely, (i) all year, that includes only studies that used the entire year; (ii) months(s), that includes studies with one or more analysed months, or an entire season, but never an entire year; and (iii) day(s), including studies that analysed short periods and/or pollution episodes (always less than a month) or used representative days to characterize a specific period. Some studies analysed multiple periods (but always from the same type of period clustering), so the total number of the periods may be greater than the total number of studies. The case studies' periods belong to years from 1970 to 2020. One of the studies (Guttikunda et al. 2019) also analysed future years, until 2030, where emission projections were considered, but without taking into account the variations in meteorology caused by climate change. Figure 5 shows that most of case studies' periods (71, ~ 49%) are less than a year, ranging from one to several months or an entire season, of which 44 used tagged species methods (e.g. Du et al. 2020;Shen et al. 2020;Bai et al. 2021), 20 used brute force methods (e.g. Wang et al. 2014b;Dolwick et al. 2015;East et al. 2021), and 7 used other methods (e.g. Dunker et al. 2017;Langner et al. 2020). Following, 53 (~ 37%) case studies' periods have focused on one or more entire years, with 31 of these being analysed with tagged species methods (e.g. Lang et al. 2017;Lonati et al. 2020;Jiang et al. 2021), 15 with brute force methods (e.g. Cho et al. 2012;Wang et al. 2018;Zhou et al. 2021), and 7 with other methods (e.g. Liu et al. 2018;Wang et al. 2020a). Finally, only 21 (~ 14%) case studies have periods with several days (Martins et al. 2015-with tagged species method;Baker et al. 2016-with brute force method;Liu et al. 2017with other method).

Discussion
From the 119 papers fully analysed in this review, tagged species methods are the most used SA technique. Although tagged species method has been applied for several purposes, the most common is the assessment of the causes of air pollution levels. According to its formulation, tagged species methods can be an asset in quantifying transboundary transport and the contribution of different areas within a country/region. Brute force methods, less applied in the 119 analysed papers, are also mostly used in the assessment of the causes of air pollution levels. Contrary to the tagged species method, the added value of brute force method is the capability to evaluate the impact of abatement measures, which is one of the keys to support the design of air quality plans.
Both tagged species and brute force methods have been used for many pollutants such NO 2 , O 3 , and SO 2 . However, most of the studies focus on PM due to the high number of exceedances and pollution episodes and the higher experience of SA techniques on this pollutant, mostly due to the FAIRMODE work (Mircea et al. 2020), that provides an overview on air pollution SA for PM. The location and time period analysed for each case study varies according to the air quality problems identified by the authors, and no relationship was found between these characteristics of the study and the SA method chosen. In most cases, the location of the case study seems to be related to the affiliation location of the authors of the works.
In this sense, the selection of the best SA technique to be used in each study must be defined based on its purpose. Due to its ability to quantify the transboundary transport, in addition to contributions from several source regions and sectors, in a single run, the tagged species method may be a better choice when the objective of a study is a more complete diagnosis of the air quality. On the other hand, if the objective is to evaluate the impact of abatement measures and/or to support the design of air quality management plans, brute force method must be chosen.

Conclusions and recommendations
Due to the growing number applications of SA techniques and the lack of works that summarise information on this topic in a systematic way, a literature review of studies, from 2010 to 2021, that include SA techniques was performed in this work. The papers analysis shows that the number of studies using SA source-oriented models has been increasing across the years, with tagged species (59%) and brute force (28%) methods being the most used 1 3 models in the majority of the 119 fully analysed studies. Also, other methods (13%) such as path integral method or response surface modelling, was applied in some studies. From the content analysis, it is possible to conclude that both SA source-oriented models have been mostly applied for PM2.5, to assess the causes of air pollution levels. Approximately 42% of the case studies are located in China and the most used time period of analysis covers month(s) and/or seasons.
Given the growing concern about air pollution in urban areas, another objective of this study was to assess the main advantages and limitations of each SA source-oriented model, to understand which one is the most appropriate to assess air quality and to support the design of air quality plans in urban areas. The tagged species method appears to be the one that provides a more complete assessment of air quality, identifying contributions from both source regions and sectors, as well as the transboundary transport. However, this method is not capable of evaluate the impact of abatement measures to support the design of air quality plans, requiring the use of brute force methods. Therefore, to better assess air quality and support the design of air quality management plans in urban areas, the two methods must be used in a complementary way.
The final goal of this review was to highlight the needs for future source apportionment applications. One of the recommendations is that longer time periods should be considered in the SA analysis since results based on day(s), month(s), and season(s) are indicative of that period patterns and contributions, but they should not be linearly extrapolated to other specific periods, as they do not fully describe the role of transboundary transport and the defined source sectors and/or regions. In addition, since air pollutants dispersion is highly driven by climaterelated events and it is expected that climate change will affect future air quality patterns, SA works considering future years, with both emission and meteorological projections, should be done.
Authors' contributions S. Coelho: conceptualization, formal analysis, roles/writing-original draft. J. Ferreira: supervision, writing-review and editing. M. Lopes: supervision, writing-review and editing. All authors read and approved the final manuscript.
Data availability Not applicable.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication Not applicable.
Competing interests The authors declare that they have no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.