Introduction

Aerosol particles affect the radiative budget of the Earth’s atmosphere through scattering and absorption of light (i.e., direct climate forcing effect) and by modulating the formation and properties of clouds (i.e., indirect climate forcing effect) [1, 2]. Aerosols also have serious adverse effects on air quality [3], human health [4], and ecosystems [5]. Organic aerosol (OA, i.e., the organic fraction of particles) accounts for a substantial fraction (~10–90% [68]) of the global submicron aerosol burden and thus is a key determinant of aerosol properties and effects. A thorough understanding of the characteristics, sources, and processes of OA is necessary to address aerosol-related environmental issues and to improve the predictive capability of air quality and climate models.

The characterization of OA chemical composition and mass concentration is limited by analytical challenges arising from the fact that atmospheric OA comprises thousands of compounds with vastly different properties such as oxidation state, volatility, and hygroscopicity [79]. This compositional complexity of OA is a consequence of the extremely diverse sources and reactions of organic species in the atmosphere [10]. By a broad classification of sources, there are primary OA (POA) emitted directly in particulate form, e.g., from fossil fuel and biomass burning or mechanical processes, and secondary OA (SOA) produced from the oxidation of volatile organic compounds (VOCs) [7]. POA and VOCs are released from various biogenic, biomass burning, and anthropogenic sources [10]; SOA formation occurs via many reaction pathways that convert VOCs into low volatility species [11]. Furthermore, the composition and properties of both POA and SOA may change dynamically throughout aerosol lifetime, because of intertwined processes including emission, oxidation, fragmentation, oligomerization, gas-to-particle partitioning, and cloud processing [1218].

Factor analysis of time and compositionally-resolved OA data enables the extraction of broad “factors” or “components” representing species that correlate in time. Each factor extracted in this way typically corresponds to many individual molecules and contains information about their sources, processing histories, and/or chemical properties. Several publications have reported factor analysis of speciated OA data from filters [1922], albeit with low time resolution (typically 24 h) which obscures some of the contrast in concentration variations because of the dynamics of the sources, chemistry, and transport. In recent years, online aerosol mass spectrometers have enabled chemical analyses of aerosols in real time with high time resolution (seconds to minutes) [23]. A range of mass spectrometers using various particle vaporization and ionization techniques have been developed. The most common designs include thermal desorption followed by electron ionization [24, 25] or other types of ionization [26, 27] and laser ablation [28, 29]. Instruments based on thermal desorption are mainly configured for determining ensemble particle properties averaged over defined time periods, whereas the laser-based instruments are primarily used for single-particle measurements. Single-step laser ablation and ionization detection schemes typically provide qualitative information about aerosol chemical constituents because the observed intensities are highly affected by the actual composition of the detected aerosol particle. Two-step thermal desorption and linear ionization schemes, on the other hand, provide quantitative and linearly additive mass spectra of mixtures (i.e., each mass spectrum that is observed is a linear combination of the responses from individual compounds present in the mixture). As discussed in more detail in the section “Bilinear modeling”, the factor analysis methods discussed in this review are based on linear additivity of constant factor mass spectra. Thus, we focus only on analysis of data from the thermal vaporization-based aerosol mass spectrometers.

Aerodyne Research aerosol mass spectrometers (termed “AMS” hereafter) are the most widely used thermal desorption-based mass spectrometers in aerosol research. The AMS can quantify the mass concentrations of non-refractory (NR) species including sulfate, nitrate, ammonium, chloride, and total organic matter via thermal vaporization (typically at 600 °C) and 70-eV electron-impact ionization (EI) [25]. The distributions of these species as a function of particle size are also determined on the basis of measurement of particle velocities inside a vacuum chamber [30]. The HR-ToF-AMS, i.e., AMS built with a high-resolution time-of-flight mass spectrometer, is further able to determine the elemental composition and oxidation states of organic aerosols [3133]. From each measurement, the AMS outputs an ensemble MS of OA that is the linear superposition of the mass spectra of individual species weighed by their concentrations. Because most molecules undergo extensive fragmentation during high-temperature vaporization and high-energy ionization inside the AMS, the AMS spectra provide information on the bulk composition of OA with limited molecular detail [25]. Mass spectral fragmentation can be limited by using soft-ionization aerosol mass spectrometers (SI-AMS), which afford increased information about the molecular composition of OA, although at the expense of quantifying the total OA mass, and typically of lower signal-to-noise ratio also [3437]. The quantitative mass spectra generated by SI-AMS systems can be used in factor analysis and can provide more detailed information about OA sources.

This review summarizes the methods of factor analysis of fast time-resolved linearly additive mass spectral data, and key results obtained with these methods about primary sources, secondary formation, evolution/aging processes of atmospheric OA, and the global context of OA. One-hundred and twenty-five papers have been published in this area so far, all but three of which focus on Aerodyne AMS data; therefore the review focuses more strongly on AMS results.

Multivariate factor analysis of aerosol mass spectra

An atmospheric field study usually lasts a few weeks to months, during which an aerosol mass spectrometer operates continuously to record the temporal variations of the composition and concentration of the OA in the form of a mass spectral matrix (denoted “ORG” hereafter), i.e., an array of m measured mass spectral ion intensities compiled over t sampling time steps. An ORG from a typical AMS study, for example, usually comprises thousands of ensemble spectra acquired with a time resolution of seconds to minutes.

Bilinear modeling

The objective of multivariate factor analysis is to deconvolve the observed ORG matrix into unique factors (Eq. 1). Factor analysis of the data matrices from quantitative instruments (e.g., AMS and some soft-ionization aerosol mass spectrometers) usually involves solving a two-dimensional bilinear model that expresses mass conservation, such that:

$$ or{g_{{ij}}} = \sum\limits_{{p = 1}}^P {t{s_{{ip}}}m{s_{{pj}}} + {e_{{ij}}}} $$
(1)

where i and j refer to row and column indices, respectively, in the ORG matrix, org ij is the signal of ion fragment j at time step i (for the AMS it is the organic-equivalent mass concentration of that fragment, in μg m−3), ts ip is the concentration of a given factor p at time step i, ms pj is the fractional contribution of ion fragment j in the mass spectrum of factor p, and e ij is the residual not fit by the model for ion fragment j at time step i. P is the total number of factors in the solution. A graphical schematic diagram of the model is shown in Fig. 1.

Fig. 1
figure 1

Schematic diagram of bilinear factor analysis of a mass spectral matrix of an organic aerosol (ORG). The time series of the factors (ts n) make up the matrix TS (Eq. 3) and the mass spectra of the factors (ms n) make up the matrix MS (Eq. 4). The differences between the measurements and the modeled results are represented as the residual matrix E. (Adapted from Ref. [38]). An example of the factor results obtained from PMF analysis of an ambient AMS dataset is shown in Fig. 2

Written in matrix form, Eq. (1) and Fig. 1 show that the bilinear model represents the matrix of data points (i.e., ORG, dimensions t × m) as the product of two smaller matrices—one of which comprises the concentration time series (TS) and the other the mass spectra (MS) or source profiles of OA factors (total number = P)—plus a matrix of residuals (E) to account for the unexplained part of ORG:

$$ {\mathbf{ORG}} = {\mathbf{TS}} \times {\mathbf{MS}} + {\mathbf{E}} $$
(2)
$$ {\mathbf{TS}} = \left[ {{\mathbf{t}}{{\mathbf{s}}_{{\mathbf{1}}}},{\mathbf{t}}{{\mathbf{s}}_{{\mathbf{2}}}}, \ldots, {\mathbf{t}}{{\mathbf{s}}_{{\mathbf{P}}}}} \right] $$
(3)
$$ {\text{\bf MS = }}\left[ {\begin{array}{*{20}{c}} {{\text{\bf ms}}_1} \\ {{\text{\bf ms}}_2} \\ \vdots \\ {\text{\bf ms}_p} \\ \end{array} } \right] $$
(4)

As illustrated in Fig. 1, ts p is a column vector representing the time series of any given factor “p” and ms p is a row vector representing its mass spectrum. For the AMS, each ms is normalized to sum to 1 so that all elements in ts have units of mass concentration (μg m−3). An underlying assumption of bilinear modeling is that each factor has a constant mass spectrum but varying concentration over time. If the true factor spectra are not constant as assumed by the model, E may be significantly larger than measurement errors even after all physically meaningful factors have been extracted.

Zhang et al. [40, 41] conducted the first bilinear factor analysis of the AMS data using a custom principal component analysis (CPCA) method. CPCA solves Eq. (1) on the basis of an iterative linear-decomposition algorithm that is initialized with the time series of two AMS tracer ions—m/z 44 (mainly CO +2 ) and m/z 57 (with a major contribution of C4H +9 in urban areas)—as the first-guess of TS. Especially at urban locations, this algorithm is able to deconvolve two chemically and physically meaningful OA factors—an oxygenated OA factor (OOA) that represents SOA and a hydrocarbon-like OA factor (HOA) that represents POA associated with urban emissions [4043]. An expanded version of the CPCA called multiple component analysis (MCA) was later developed to separate more than two factors [44]. Application of MCA to 37 AMS datasets acquired from various urban, rural, and remote atmospheric environments revealed that the sum of OOAs is often larger than the sum of HOA and other POA factors [6, 4547], indicating that atmospheric OA are dominated by oxygenated species, mainly of secondary origin [6].

Positive matrix factorization (PMF)

PMF [48, 49] is a standard multivariate factor analysis model broadly used in the field of air pollution source apportionment. In recent years, it has seen more applications in factor analysis of quantitative aerosol mass spectrometry [34, 38, 5052]. PMF models the data matrix (ORG) according to Eq. (1) as a positively constrained, weighted least-squares problem without a-priori assumptions for either source (MS) or time (TS) profiles [38, 48]. The researcher chooses the number of factors, P, and the solution to PMF is the one that minimizes the sum of the weighed squared residuals (“Q value”, or “PMF quality-of-fit parameter”):

$$ Q = \sum\limits_{{i = 1}}^t {\sum\limits_{{j = 1}}^m {{{({e_{{ij}}}/{\sigma_{{ij}}})}^{{2}}}} } $$
(5)

where σ ij is an element in the t × m matrix of estimated errors (1σ measurement precisions) corresponding to the variables (org ij ) in ORG (Eq. 1). The purpose of this scaling is to weigh each variable by its degree of measurement uncertainty, so that the factor analysis model can make use of the real information content of the dataset [48]. Each value in the solution matrices (i.e., MS and TS in Eq. 1) of PMF is constrained to be positive, reflecting the real atmospheric situation. The bilinear PMF model can be solved by several algorithms, with the PMF2 and multilinear engine (ME-2) software distributed by P. Paatero [48, 53] being the most commonly used.

If the assumptions of the bilinear model are appropriate for the dataset and the error estimates are accurate, when the minimum Q value is achieved all elements in the matrix are fit to within their expected error, i.e., \( \left| {{e_{{ij}}}} \right|/{\sigma_{{ij}}} \approx {1} \). Then the expected value of Q (Q exp) should equal the degrees of freedom of the fitted data [54]:

$$ {Q_{{{ \exp }}}} = \left( {t \times m} \right) - P \times \left( {t + m} \right) $$
(6)

For AMS datasets, because \( t \times m > > P \times (t + m) \), Q expt × m (the number of points in ORG).

Thus, if the bilinear model is appropriate and the errors are small, the solution with the correct number of factors should give Q/Q exp near unity [38]. Values of Q/Q exp > > 1 indicate either underestimation of the errors, or variability in the factor mass spectra that cannot be simply modeled as the sum of the given number of factors. Q/Q exp < < 1 indicates overestimation of the errors of the input data.

PMF analysis of aerosol mass spectra

Lanz et al. [51] reported the first PMF study on an AMS dataset acquired in Zurich, Switzerland, in summer 2005 and identified six factors, including an HOA, two OOAs, and three factors linked to charbroiling, wood burning, and food cooking sources, respectively. The less oxidized OOA-2 factor was found to represent less processed, more volatile SOA [51]. Ulbrich et al. [38] also reported using PMF for identification of a semivolatile OOA-2 factor, in addition to a more oxidized, regional OOA-1 and an HOA in Pittsburgh. These PMF results agree well with the original two-factor (HOA and OOA) CPCA results of Zhang et al. [40, 55]. The fact that PMF is able to retrieve a low-concentration, yet distinct factor (OOA-2) highlights its strength in extracting information from datasets, e.g., resolving factors that make up a small fraction of the total mass. Based on PMF analysis of synthetic ORG matrices that were reconstructed assuming variable contributions from Pittsburgh OOA-1, OOA-2, and HOA factors, Ulbrich et al. [38] estimated that PMF of quadrupole AMS data (unit-mass resolution; UMR) can typically retrieve factors that account for at least 5% of the AMS mass.

The extraction of two distinct OOA subfactors was achieved later in PMF studies of a large number of other AMS datasets [13, 14, 18, 56]. Typically, the more oxidized OOA factor (OOA-1) correlates well with sulfate and is thought to be more aged and non-volatile. In contrast, the less oxidized OOA (OOA-2) is thought to be typically semivolatile because of its diurnal cycles and time trends that are similar to those of ammonium nitrate and chloride, both of which dynamically partition between gaseous and particulate phases depending on ambient temperature and humidity. The relative volatility characteristics of the two OOAs were confirmed by thermodenuder measurements [5760]. In particular, Cappa and Jimenez [60] reported volatility distributions for both OOAs and other OA components for Mexico City. For these reasons, Jimenez et al. [13] introduced the more descriptive acronyms LV-OOA (low-volatility) and SV-OOA (semivolatile), respectively, that have become the standard terminology. However, the terminology LO-OOA and MO-OOA for less and more oxidized OOA, respectively, is also appropriate, especially for datasets for which volatility data are not available. More discussion on the differences among OOA subtypes is given in the section “OOA subtypes and interpretation”.

PMF studies have been conducted on AMS datasets acquired with both UMR and high-resolution (HR) mass spectrometers. Most of the earlier datasets are UMR but more HR-AMS-PMF results have been reported recently [22, 39, 50, 6168]. A main advantage of the HR-AMS data is the separate quantification of different ions having the same nominal mass, enabling more precise characterization of the temporal variations of different ion types (e.g., C x H y +, C x H y O z +, C x H y N p +, and C x H y N p O z +). The enhanced chemical resolution, and thus the higher information content in the HR-AMS datasets, is useful for constraining the PMF solutions, reducing their rotational ambiguity and leading to more easily interpretable solutions and, potentially, a larger number of interpretable OA factors. For example Aiken et al. [64] reported that HOA and biomass burning OA (BBOA) were better separated using HR-AMS data as opposed to when the same data were analyzed as UMR, because their spectra are somewhat similar in UMR but very different in HR. In addition, the HR mass spectra of the OA factors also contain more information useful for interpreting their sources and processing.

Figure 2 shows an example of typical PMF results from an HR-AMS dataset, including the time series, HR spectra, and diurnal patterns of the individual OA factors. The dataset was acquired in New York City in summer 2009 [39]. Five OA factors were determined, each with distinct temporal variation and mass spectral patterns:

  1. 1.

    LV-OOA (oxygen-to-carbon atomic ratio O/C = 0.63) that correlates strongly with sulfate;

  2. 2.

    SV-OOA (O/C = 0.38) that correlates better with ammonium nitrate and chloride than LV-OOA does;

  3. 3.

    a nitrogen-enriched OA (NOA) with a much higher N/C ratio (0.052) than other OA components (~0.004–0.011);

  4. 4.

    a cooking-related OA (COA) which has spectral features similar to those of POA from cooking emissions and a distinctive diurnal pattern peaking during lunch and dinner times; and

  5. 5.

    an HOA that represents POA from fossil fuel combustion given its low O/C ratio (0.06) and good correlation with primary combustion emission species, for example NO x and EC.

Fig. 2
figure 2

The mass spectra (left), time series (middle), and diurnal patterns (right) of five OA factors determined on the basis of PMF analysis of an HR-ToF-AMS dataset acquired in summer 2009 in New York, NY, USA. (a) LV-OOA, surrogate for regional, highly aged, low-volatility SOA; (b) SV-OOA, surrogate for less photochemically aged, semi-volatile SOA; (c) NOA, a nitrogen-enriched OA, probably derived from an SOA formed via acid–base chemistry or photochemical reactions of amino compounds, but possibly also by other mechanisms or by involvement of other reduced nitrogen compounds; (d) COA, a POA component probably dominated by cooking emissions; and (e) HOA, a surrogate for urban, combustion-related POA. In the time series plots, the corresponding time trends of tracer compounds are: (a) sulfate representing low volatility secondary aerosol species formed on regional scale; (b) nitrate and chloride representing semivolatile secondary species; (c) C3H8N+ as a tracer ion for reduced nitrogen compounds; (d) C6H10O+ as a tracer ion for cooking aerosols; and (e) elemental carbon and NO x as tracer species for combustion emissions. In the HRMS of OA factors, each peak is colored on the basis of the contributions of five ion categories: C x H y +, H y O 1 +, C x H y O z +, C x H y N p +, and C x H y O z N p +. The elemental and organic mass-to-carbon ratios for each factor are shown in the legends. (Adapted from Ref. [39]) A summary of the key diagnostic plots of the PMF results is shown in Fig. 5

Detailed discussions of the association of each component with different sources and processes are given in Sun et al. [39].

In addition to AMS data, PMF has been applied to the OA data from other aerosol mass spectrometers. To our knowledge, there are two published studies, one from a SI-AMS and the other from an on-line GC–MS. Because both techniques determine individual molecules (or larger fragments or decomposition products) in aerosols, these analyses may be particularly useful for studying the sources and source contributions of OA. Williams et al. [69] performed PMF analysis on hourly time-resolution data of organic marker compounds measured with a thermal desorption aerosol GC–MS–FID (TAG) instrument from a study site in Riverside, California. The grouping of marker compounds in each factor was used to identify the presence of several different source types, including local vehicle emissions, food cooking operations, biomass burning, regional primary anthropogenic emissions, biogenic POA sources, several types of SOA, and semivolatile anthropogenic and biogenic OA. A key challenge in this type of tracer-based apportionment is to assign a fraction of OA mass to each factor, because the tracers account for only a small fraction (<20%) of the OA mass. Williams et al. [69] performed this step using a multivariate fit of their OA components to the AMS OA concentration. Figure 3 shows the diurnal cycles of OA sources obtained with this method during the summer study, with SOA (POA) being more important during the day (evening/night), in agreement with, e.g., the NYC results in Fig. 2 and AMS results at other locations. An earlier study applied principal-component analysis (PCA) on data from the same instrument, and identified several sources due to transported and local anthropogenic pollution, transported and local biogenic emissions, and a local marine or dairy source, for a summer 2004 dataset in coastal Nova Scotia, Canada [70]. However, PCA apportions variance, and the resulting time series and mass spectra of the factors can contain negative values. PCA is therefore fundamentally different from PMF, which apportions the mass directly into nonnegative solutions that are physically meaningful.

Fig. 3
figure 3

Average diurnal concentrations of TAG-derived PMF factors over the summer focus period during the summer 2005 study in Riverside, California (inland Los Angeles area). PM 1 organic aerosol mass concentrations are labeled outside the pie chart ring, and time of day is labeled inside the pie chart ring. (Fig. 15 in Ref. [69], reprinted with permission)

Dreyfus et al. [34] used PMF to study the time trends of 60 organic molecular and fragment ions measured with a photoionization aerosol mass spectrometer (PIAMS) with a time resolution of a few minutes. Six factors were identified and linked to POA sources, including diesel exhaust, car emissions/road dust, and meat cooking. The mass contributions of individual sources were subsequently estimated, similarly to Williams et al. [69], by combining the PMF results and the EC/OC data. Figure 4 shows the results of a factor attributed to meat cooking aerosol, based on:

  1. 1.

    the mass spectrum that shows prominent peaks at m/z values corresponding to the molecular ions of palmitic, linoleic, stearic, and oleic acids, all of which are tracer compounds for meat cooking (Fig. 4a);

  2. 2.

    the diurnal pattern that shows two characteristic peaks consistent with typical mealtimes (Fig. 4b); and

  3. 3.

    the wind rose plot which shows features consistent with the locations of cooking facilities near the site (Fig. 4c).

Fig. 4
figure 4

A meat cooking factor obtained via PMF analysis of a photoionization aerosol mass spectrometer (PIAMS) dataset acquired in fall, 2007, in Wilmington, Delaware, USA. (a) Apportioned signal vs. m/z; (b) diurnal profile (error bars show the standard deviation of the range of values at each time point); (c) wind rose plot for high-impact periods (outer plot) and all data points (inner plot). (Fig. 3 in Ref. [34] Copyright 2009. Elsevier, reprinted with permission)

Similar diurnal dependences were observed for cooking aerosol factors identified by factor analysis of AMS datasets at several locations, as discussed above for New York City [39, 51, 66, 71, 72].

Evaluation and selection of PMF solutions

Although a major objective of multivariate factor analysis is to explore underlying covariation of variables in a dataset to extract physically meaningful factors that can be related to distinct sources, processes, and/or properties, the solution algorithms provide only mathematical solutions that require careful evaluation and interpretation. In this section the various steps in PMF analysis are illustrated within the framework of AMS data. Ulbrich et al. [38] conducted a thorough assessment of PMF modeling of AMS data and discussed in detail several technical aspects of the analysis, including error matrix preparation, data pretreatment, selections of the optimum number of factors (P) and rotational forcing parameter (FPEAK), and evaluation of PMF solutions. These steps are summarized in Table 1. These authors also reported the development of an Igor-based (WaveMetrics, Lake Oswego, OR, USA) open-source PMF Evaluation Tool (PET, available at http://cires.colorado.edu/jimenez-group/wiki/index.php/PMF-AMS_Analysis_Guide#PMF_Evaluation_Tool_Software) that enables systematic probing of the PMF solution space, automated batch analyses, and user-friendly visualization and intercomparison of the solutions and residuals [38].

Table 1 Steps for preparing and choosing the best solution from PMF analysis of AMS datasets

Figure 5 shows a summary of key diagnostic plots useful for evaluating the final PMF results from an AMS dataset, which should also be useful for other aerosol mass spectrometers. An important first step of the PMF analysis is to decide the optimum number of factors that “best” explain the data. Ulbrich et al. [38] demonstrated that the trend of the PMF quality-of-fit parameter (Q) changing with regard to the number of factors can be useful to identify the minimum number of factors (Fig. 5a). A large decrease in Q/Q exp with the addition of a factor indicates that the additional factor is able to explain a significant fraction of the variation in the data unaccounted for by the others. In addition, examining the Q/Q exp contributions per column (m/z) or per row (t) in the matrix may help identify individual m/z values or time steps that affect the model lack-of-fit most strongly (Fig. 5h, i). Depending on the causes, and especially when non-physical effects (e.g., instrumental issues) cause extraneous variability that interferes with the PMF identification of the real components, the corresponding variables may be properly downweighted or even removed to reduce disproportionate effects on the fitting outcome [74].

Fig. 5
figure 5

Summary of key diagnostic plots of the PMF results for an HR-ToF-AMS dataset acquired in New York in 2009 (the PMF results are given in Fig. 2): (a) Q/Q exp as a function of number of factors (P) selected for PMF modeling. For the five-factor solution (i.e., the best P): (b) Q/Q exp as a function of FPEAK, (c) fractions of OA factors vs. FPEAK, (d) correlations among PMF factors, (e) the box and whiskers plot showing the distributions of scaled residuals for each m/z, (f) time series of the measured organic mass and the reconstructed organic mass (= LV-OOA + SV-OOA + NOA + COA + HOA), (g) variations of the residual (= measured − reconstructed) of the fit, (h) the Q/Q exp for each point in time, and (i) the Q/Q exp values for each m/z. (Adapted from Ref. [39])

The model residual time series (i.e., the difference between the summed measured mass spectrum and its modeled approximation; Fig. 5g) is particularly useful for evaluating the solutions of PMF and related methods [38, 41]. The presence of time-dependent structure in the residual suggests the need for additional factor(s) for better fitting. However, for ambient datasets (Fig. 5g), it is common that substantial structure remains in the residual time series after all physically meaningful factors have been assigned. A main reason for this is true variations in the spectra of the factors, which cannot be captured with a reasonable number of components given the assumption of constant spectra in bilinear models including PMF [38]. As Ulbrich et al. [38] pointed out, this assumption of constant factor mass spectra in bilinear methods can limit the retrievability of small factors from AMS datasets.

When P is chosen, the stability, uniqueness, and interpretability of the factor solutions should be checked. The rotational ambiguity of the solutions may be explored by changing the FPEAK [48]. Both Lanz et al. [51] and Ulbrich et al. [38] discussed the variations in factors vs. FPEAK (Fig. 5b). The robustness of the solution may also be examined by running the PMF algorithm from different random starting points (SEED parameter). Variations in different plausible solutions corresponding to different FPEAK or SEED values may be evaluated to determine the uncertainties of the PMF solution [38, 49, 75]. In addition, the uncertainty of the solution corresponding to a given P and FPEAK can be analyzed quantitatively using bootstrapping analysis [38]. Finally, the interpretability of the OA factors should be evaluated on the basis of their mass spectral features and temporal variation patterns (details are given in the section “Interpretation of the extracted OA factors”).

Interpretation of the extracted OA factors

The objective of interpreting the solutions of PMF and similar methods is to identify and validate the relationships between OA factors and distinct emission sources, physicochemical properties, and atmospheric processes. The interpretability of the OA factors is also an important criterion for evaluating the quality of the multivariate analysis. The interpretations of the OA factors are usually based on the following considerations:

  1. 1.

    the temporal correlations of factors with tracer species representative of specific emissions and processes;

  2. 2.

    the mass spectral features of each factor, for example peak distribution patterns, signature fragments, and oxidation state;

  3. 3.

    the repetitive temporal or diurnal variation patterns that are indicative of specific human activities or meteorological patterns (for example traffic rush hours, dilution because of the increase of the planetary boundary layer, cooking emissions during mealtimes, photochemical production of secondary species, etc.);

  4. 4.

    the estimated size distributions of OA factors (or tracer ions) and their evolution patterns;

  5. 5.

    information regarding airmass trajectories and locations of upwind source regions; and

  6. 6.

    other collocated observations that enable the isolation of special cases (e.g., new particle formation and growth events identified according to scanning mobility particle sizer measurements [40] and well-defined SOA growth events [42]).

The correlations between the time series of OA factors and those of independent external tracer species (i.e., species not included in ORG) are especially important for addressing the physical meaning of the OA factors. High time resolution of the OA factors greatly facilitates interpretation of their physical meaning, and this is probably the most important reason for the rapid acceptance of factor analysis by the research community. The fast measurements capture the dynamic variations caused by true changes in aerosol sources and transport, while minimizing the uncertainties caused by apparent correlations with tracer species because of longer time averages in, e.g., filter analyses.

An important step of factor interpretation is to compare the extracted factor spectra with reference spectra sampled from various source types, a large number of which have been published in the literature [14, 7684] and are publicly available in the AMS Spectral Database at http://cires.colorado.edu/jimenez-group/AMSsd/. An especially useful set of spectra for ambient HOA, SV-OOA, LV-OOA, and OOA were reported by Ng et al. [14] by averaging OA factors determined from a large number of ambient AMS datasets. The similarity between two mass spectra (or two time series) can be evaluated using Pearson’s R, the coefficient of determination (R 2), or the uncentered correlation coefficient. In addition to comparisons of the full mass spectra, examining the correlations among peaks above m/z 44 can avoid biases caused by small m/z ions that generally dominate the mass spectra [38, 51]. At times, however, unrealistic factors can have spectra which look similar to those in the database, so this criterion is not sufficient for supporting the identification of a factor [38]. The presence of key marker ions, for example m/z 44 for OOA or m/z 60 for BBOA, is another useful criterion for factor interpretation [14].

Examples of established tracer-factor relationships are discussed here. A large number of studies have demonstrated the dominant association of OOA with SOA, based on observations that OOA generally correlates well with:

  1. 1.

    secondary inorganic aerosol species—sulfate, nitrate, and/or non-refractory chloride (e.g., ammonium chloride) [6, 3840, 42, 66, 72, 85, 86];

  2. 2.

    water-soluble organic carbon (WSOC) concentrations [87]; and

  3. 3.

    gas-phase photochemically-produced species such as odd oxygen \( \left( {{{\text{O}}_x} = {\text{N}}{{\text{O}}_{{2}}} + {{\text{O}}_{{3}}}} \right) \) [88, 89] or glyoxal [42].

The association of HOA with OA from primary sources has been supported by collocated measurements of tracer species associated with combustion emissions, including CO, NO x , polycyclic aromatic hydrocarbons (PAH), and black or elemental carbon (BC/EC). Specific evidence includes:

  1. 1.

    good correlation between the concentrations of HOA and combustion tracer species [6, 3841, 64, 66, 72, 85, 128];

  2. 2.

    estimated emission ratios of HOA against EC, NO x , and CO (i.e., HOA/EC, HOA/NO x , or HOA/CO) consistent with source measurements [6, 39, 40, 43, 64]; and

  3. 3.

    consistency between HOA and POA concentrations estimated using tracer-based approaches, for example the EC tracer method, the CO tracer method, and the chemical mass balance (CMB) model using organic molecular markers [40, 43, 63, 64].

In addition, the mass spectra of HOA from various studies generally show fragmentation patterns characteristic of long-chain hydrocarbons and are very similar to those of diesel exhaust, lubricating oil, and freshly emitted traffic aerosols observed in urban areas [14, 41]. HOA appears chemically reduced with average oxygen-to-carbon (O/C) ratio typically less than 0.1 [13, 14, 31, 39, 40].

The identification of OA factors associated with biomass burning (i.e. BBOA) has been supported by correlations of this factor with biomass burning emission tracers (e.g., acetonitrile, levoglucosan, potassium, and non-fossil EC), elevated peaks at m/z 60 (C2H4O +2 ) and 73 (C3H5O +2 ) in the mass spectra of the factor, and model dispersion analyses from the locations of known forest fires [22, 65, 79, 80, 83, 85, 90].

COA has been identified in several studies as discussed above. In other studies, especially those using UMR data, it is often not separately identifiable and may be part of the HOA and/or BBOA factors [72], because of the similarity b the COA spectra (especially the UMR spectra) and the HOA and BBOA spectra [83]. External tracers of food cooking are not usually available, but the diurnal profiles of the factors show that HOA peaks during rush-hour periods whereas COA peaks during typical meal times. Sun et al. [39] suggested that the C5H8O+, C6H10O+, and C7H12O+ ions in the HR-AMS spectra may potentially be useful as AMS-spectral markers for COA. In addition, the COA and HOA factors could be differentiated on the basis of the signal ratio of m/z 55 to m/z 57 as the COA spectrum tends to show substantially higher m/z 55 to 57 ratio [39, 83, 84].

Nitrogen-enriched OA or local OA factors (NOA or LOA) have been reported in several studies [39, 58, 64]. The NOA or LOA mass spectrum has important contributions from many nitrogen-containing fragments not observed in other factor mass spectra, and consistent with reduced nitrogen species such as amines, amides, or nitriles [39]. Although no external tracers have been identified to help link NOA components to a particular source or process, NOA factors tend to have spiky time series and are, therefore, likely be the results of more local emissions, i.e., if NOA was emitted or produced farther away, it would disperse in the atmosphere and have a smoother time series.

Analysis of the air mass trajectory histories and comparisons with the results of other source apportionment techniques may provide further support for the interpretation of the OA factors [22, 41, 63, 72, 91, 92]. In addition, the chemically-resolved size-distribution data from the AMS are valuable for elucidating the sources and processes of OA factors. For instance, the frequently-observed similarity between the size distribution of m/z 44 (AMS tracer for OOA) and sulfate supports the association of OOA with SOA [39, 40, 43, 9396]. Zhang et al. [40] estimated the size distributions for OOA and HOA based on measured size distributions of m/z values 44 and 57 and the mass spectral patterns of OOA and HOA using a UMR AMS dataset acquired in Pittsburgh in 2002, as shown in Fig. 6. The size distribution of HOA shows a distinct ultrafine mode that is commonly observed for primary particles from fresh combustion emissions [12, 76, 97] most prominently during morning rush hours and at night when boundary layer height is low and atmospheric dilution of the primary emissions is weak. In contrast, OOA is concentrated in the accumulation mode peaking between 400 and 600 nm (in vacuum aerodynamic diameter; d va [30]) and seems to be mostly internally mixed with sulfate, a secondary inorganic species. The short lifetime of ultrafine particles supports the association of HOA with more local sources. In contrast, the diurnal variations of OOA and sulfate are relatively weak, both in terms of size distributions and concentrations, indicating strong effects from regional sources and processing. The fact that OOA and sulfate correlate well both in concentration and size distribution strongly supports the dominant secondary contribution to OOA, and may also be indicative of the effect of cloud processing on SOA production similar to that on sulfate. Furthermore, an additional piece of evidence for the secondary nature of OOA is that the evolution pattern of OOA size distribution during an intense new particle formation and growth event clearly indicates growth of OOA via surface condensation (i.e., gas to particle conversion) [40, 95].

Fig. 6
figure 6

Average diurnal variations of the estimated size distributions of (a) HOA, (b) OOA, and (c) sulfate during 7–22 September, 2002, in Pittsburgh, USA. (Figure 6 in Ref. [40], reprinted with permission)

Advanced factor analysis of aerosol mass spectra

In addition to standard PMF bilinear modeling, more advanced factorization approaches have been applied to aerosol mass spectrometry data to improve the specificity and interpretability of the solutions. Slowik et al. [50] successfully performed PMF analysis on a combined matrix of the OA mass spectra acquired with a UMR AMS and the mass spectra of volatile organic compounds acquired with a proton-transfer-reaction mass spectrometer (PTR-MS) during winter in Toronto. The uncertainties used in PMF for each instrument were scaled to result in similar weights in the PMF analysis. Six factors characteristic of charbroiling, traffic, aged SOA, local SOA, oxygenated POA, and a local point source were identified, with information on the temporal and source profiles of both OA and VOCs for each factor (Fig. 7). According to the authors, PMF analyses of separated AMS or PTR-MS datasets were not able to identify as many factors, because of effects (e.g., collocated emissions and meteorological variations) that blur the distinctions between primary and secondary species in the same phase, thus enhanced variance in the unified AMS/PTR-MS dataset was thought to have enabled the distinction of more similar factors. Another important advantage of incorporating the VOC data was the simultaneous and coherent apportionment of VOCs to the same emission sources and atmospheric processes represented by the OA factors. The availability of both VOCs and OA profiles also facilitated the interpretation of the factors. However when atmospheric aging is more important than in this winter Northern latitudes dataset, the VOC profiles are strongly distorted by the photochemistry in a time-dependent manner which is inconsistent with PMF’s assumptions [98], and a similar joint AMS + PTR-MS analysis has not been reported under those conditions.

Fig. 7
figure 7

PMF analysis results of a unified dataset of AMS and PTR-MS measurements acquired in winter, 2007, from Toronto, Canada. Mass spectra (a) and time series (b) of the PMF factors (black traces, left axis) and selected tracer species (colored traces, right axis). The time series of PTR-MS m/z 69 is plotted in arbitrary units. (Figure 11 in Ref. [50], reprinted with permission)

Lanz et al. [85] factored an AMS dataset collected in Zurich during a period dominated by wintertime inversions. Because of the high residence time of air masses, species from different sources co-varied to such a extent that standard PMF analysis could not separate physically meaningful source profiles. This is because of an inherent limitation of the PMF and similar algorithms, which have difficulty resolving factors that are either too similar in mass spectra or time series. To enable the extraction of physically meaningful factors, Lanz et al. [85] incorporated estimated source profiles and solved the bilinear model by use of multilinear engine (ME-2) software [53, 99]. ME-2 uses a different algorithm that can solve the “standard” bilinear PMF, and many other factorization models. The introduction of a-priori source profiles in the fitting, which is not a requirement when using ME-2, can be viewed as a hybrid of the PMF and the chemical mass balance (CMB) model, which determines the relative source contributions of OA factors using fixed known source and/or mass spectral profiles [100]. Using the MS of POA from diesel bus emission experiments as the first-guess a-priori profile and allowing the profile to partially deviate from the a-priori one, three distinct OA factors, including an OOA mostly representing SOA, a factor representing BBOA from wood combustion, and a traffic-related HOA, were identified from the Zurich dataset [85]. Figure 8 shows how the spectral similarities between hybrid and reference OA factors change as a function of the regularization parameter (“a-value”), which defines the degree of constraint of the HOA factor. An “a-value” of 0 means that the a-priori profile (i.e., the mass spectrum of the HOA factor) is not allowed to change during iterative fitting. This figure indicates that a = 0.6 is a good compromise for this dataset because it allows flexibility in the solution while the factor spectra still show high similarity to reference profiles.

Fig. 8
figure 8

Results from hybrid PMF-CMB analysis of an AMS dataset acquired during wintertime in Zurich, Switzerland. (a) Spectral similarity (R 2) of the retrieved HOA factor spectrum to the reference mass spectra representing urban combustion POA and the changes of the HOA factor as a function of the regularization parameter (“a-value”), which is the degree of deviation from the initial spectrum allowed for the HOA factor. Similarity (R 2) of the mass spectra of wood burning (b) and OOA (c) factors to reference mass spectra representing BBOA and OOA, respectively, as a function of the “a-value”. The values in the parentheses before the line symbols are the numbers of references cited in that paper. (Fig. 3 in Ref. [85]. Copyright 2008. American Chemical Society, reprinted with permission)

DeCarlo et al. [62] reported the application of PMF to flight data from the Mexico City region, when fresh urban and biomass burning emissions mixed and underwent very strong photochemical aging. In this case PMF identified HOA, BBOA, SV-OOA, and LV-OOA factors, but evidence such as ratios of components to tracer species indicated the combined effect of urban and biomass burning sources in several factors. These authors applied a “postprocessing” step to the PMF output in order to apportion fractions of some of the components to urban and biomass burning sources, based on ratios of components to tracers and the observed variations between days with and without the effect of intense biomass burning.

Ng et al. [101] first reported the application of full CMB modeling to AMS data using experimentally determined aerosol source profiles as fixed input mass spectra. A main advantage of CMB is the ease of solution using a simple linear decomposition algorithm, which can be performed on data acquired in real-time without waiting for a sufficiently large dataset for PMF analysis. Consistent with the discussion of the previous study, CMB may also be able to separate factors that correlate strongly in time, because of the collocations of emission sources and/or meteorological effects. However, CMB is only suitable when appropriate and complete source profiles are available. Average mass spectral profiles for ambient HOA, OOA, LV-OOA, and SV-OOA have recently been determined by Ng et al. [14] by averaging those observed in many studies. When performing CMB of several datasets, the component concentrations were typically within 30% of those determined using full PMF analysis.

Recently, Ulbrich et al. (2011) reported for the first time the application of two three-dimensional factorization models (“PARAFAC” and “Tucker-1”), solved with the PMF3 [102] and the ME-2 [53] algorithms, to the size-resolved mass spectral dataset from an HR-ToF-AMS. In contrast with quadrupole AMSs, AMS systems using time-of-flight mass spectrometers are able to acquire full mass spectra of OA for each particle size, i.e., to determine the size distributions of each individual m/z in the spectrum. By applying 3D factorization, Ulbrich et al. [52] determined the size distributions of four OA factors HOA, OOA, BBOA, and a nitrogen-enriched factor (NOA or LOA). These factors were consistent with source measurements and previous estimates of the component size distributions, and enabled identification of cases when, e.g., HOA-containing particles grew strongly because of the condensation of secondary species. An important aspect of this work is that the 3D factorization approaches, methodologies, and considerations are generally applicable to multi-dimensional mass spectral datasets such as those from thermal-desorption mass spectrometers or hyphenated aerosol mass spectrometers (e.g., coupled with chromatography for chemical separation).

Insights into organic aerosol sources, processes, and properties

Overview of OA factors and their spatial distribution

An important advantage of factor analysis is that it reduces the extremely complex OA composition to a limited number of chemically and physically meaningful components that can be linearly combined to reproduce the observed time and chemical variations in ambient OA. A broad overview on the chemistry, variability, and evolution characteristics of atmospheric submicron OA is emerging from the analysis of a large number of AMS datasets acquired from worldwide locations. Zhang et al. [6] performed an integrated analysis of 37 AMS datasets collected from three continents (North America, Europe, and Asia) and concluded that oxygenated OA species (surrogate for SOA) are ubiquitous and dominant in the Northern Hemisphere. PMF analyses of these worldwide AMS datasets further led to the compilation of a global picture on the oxidation states and dynamic evolution of multiple OA factors [13, 14, 103], which complements those obtained by use of other techniques [104]. In addition, two recent studies discussed the spatial distribution of aerosol chemical composition and the evolution of OA in Europe based on measurements of aerosol chemical composition across the continent [18, 56].

Figure 9 shows a summary of the results from PMF analysis of 43 worldwide datasets. Usually 2–5 OA components were extracted for each study, consistent with those discussed above. It is important to note that the mass spectrum corresponding to any given factor is similar but not identical across multiple sites [14]. From Fig. 9 it is clear that the average mass concentrations and compositions of submicron particles vary substantially across sites. Whereas both HOA and OOA loadings decrease with distance from urban sites, the decrease of HOA is much more pronounced and the average OOA concentrations are often of the same order in various atmospheric environments [6]. The relative contribution of OOA evolves from an average of 63% of the organic matter in submicron particles at urban locations to usually more than 90% in rural and remote locations [6]. This difference in relative contributions reflects the spatial differences in the sources of HOA and OOA, respectively. Primary sources of OA (particularly HOA) are highest in urban centers and are quickly diluted when advected away from their sources. SOA is thought to be produced on time scales of a day or more in amounts that greatly exceed the urban HOA [10], so OOA corresponds not only to locally produced SOA but also SOA produced downwind of polluted areas and thus has a larger footprint (i.e., over wider regions). In addition, the source region for SOA precursors has additional sources with a more regional footprint, including regional biogenic emissions and biomass burning emissions.

Fig. 9
figure 9

(a) Average total mass concentration and composition of PM1; (b) average mass concentrations of oxygenated OA factors (including OOAtotal, LV-OOA, and SV-OOA); and (c) average mass concentrations of primary OA factors (including HOA, BBOA, and COA) in PM1 at individual sites in northern hemisphere (Plotted using data from Zhang et al. [6], Jimenez et al. [13], and Ng et al. [14])

Figure 10 provides a clear example of this difference in HOA and OOA source footprints. The data points shown in the figure correspond to AMS and CO2 (as a combustion tracer) gradient measurements downwind of a highway source (I-93 in Somerville, MA, USA) reported in Canagaratna et al. [12]. The measurements were made aboard a mobile laboratory equipped both with instruments that responded rapidly to aerosols and with gas phase instruments, with AMS measurements performed every few seconds [105]. The OOA concentrations were not significantly affected by the highway source and no gradient was observed. The gradient in HOA concentrations, on the other hand, mirrors the decreasing CO2 concentrations, and is indicative of the effect of dilution as the pollution plume is mixed with ambient air.

Fig. 10
figure 10

Profiles of the concentrations of HOA, OOA, and CO2 (as an indicator of dilution) as a function of distance downwind of a major highway (Interstate 93) in Somerville, MA, USA. (Adapted from Fig. 4a in Ref. [12])

When HOA particles mix in with background air, they can also become coated with background secondary aerosol material (i.e. species such as ammonium sulfate, ammonium nitrate, and secondary organic species). Cross et al. [106] used PMF analysis of single particle mass spectra measured with the single particle light-scattering AMS (SP LS-AMS) to show that HOA particles, which were likely to be emitted as POA from a near-by highway, quickly become internally mixed with secondary inorganic and organic compounds after emission. Canagaratna et al. [12] determined the effective densities of the single particles examined in the Cross et al. [106] study and confirmed that they were consistent with their PMF-based classification as “pure” and “mixed” HOA single particles. Taken together, these results indicate that coating processes are likely to modify HOA particles and lead to an evolution of their properties and effects with time.

OOA subtypes and interpretation

In environments in which chemical variation in the OOA is not significant, either because of lack of change in the meteorological conditions (e.g., ambient temperature), source effect, or photochemical age distribution, only one OOA is extracted [14]. However, in environments where OOA is subject to continuous evolution, several OOA factors which represent the end points of a relatively continuous chemical variation arising from different levels of aging are observed [14, 18, 39, 91]. In many cases LV-OOA and SV-OOA, which appear to be surrogates for regional/more-aged and fresher/less-aged SOA, respectively, are observed [38, 39, 51, 6264]. The volatilities of OOA subtypes were confirmed by measurements of aerosol volatility using a thermodenuder to remove semivolatile species in a temperature-dependent manner [58, 107] or inferred according to their correlations with ammonium sulfate (suggesting low volatility) and with ammonium nitrate and chloride (suggesting semi-volatility) [101]. Whereas the separation of LV-OOA and SV-OOA is frequently reported, especially during summer time, a few studies have reported OOA factors with different O/C ratios but similar volatility [66, 72, 91]. In these cases the SV-OOA and LV-OOA terminology is not appropriate, and that of more and less oxygenated OOA (MO-OOA and LO-OOA) should be used instead. The different OOA factors may also reflect other sources or processing factors that cause differences in the spectra of groups of oxygenated species. In particular, at some locations affected by distinguishable biogenic and anthropogenic impact periods, for example Chebogue Point, Nova Scotia [82], Whistler Peak, British Columbia [108], and the Egbert rural site 70 km north of Toronto [67], the less oxidized OOA component seems to be associated with biogenic emissions (i.e., a surrogate for biogenic SOA) whereas the more oxygenated OOA seems to be associated with air masses transported from polluted regions (i.e., a surrogate for anthropogenic and anthropogenically controlled biogenic SOA).

Ng et al. [14] studied the aging of OA components in the atmosphere by examining differences between the mass spectra of LV-OOA and SV-OOA on the basis of two main OOA ion fragments at m/z 44 (CO +2 ) and m/z 43 (mostly C2H3O+). The composition differences between the OOA components are reflected in the different intensities of these two ions. As shown in Fig. 11, the LV-OOA component spectra have a higher f 44 (ratio of m/z 44 to total signal in the component mass spectrum) and lower f 43 (defined similarly) than SV-OOA. When f 44 (a surrogate for O/C ratio and an indicator of photochemical aging) is plotted against f 43, ambient OOA components lie within a well-defined triangular region. The different OOA components in Fig. 11 offer snapshots of the continuum of evolving OOA properties in ambient aerosol. Morgan et al. [18] have shown a similar continuum of AMS OOA components in aircraft measurements over Europe. The less aged SV-OOA occupies the broader base of the triangle, which is likely to reflect the variable composition of fresher SOA formed from site-specific precursors and sources. The LV-OOA, on the other hand, occupies the narrowing top region of the triangle. LV-OOA factors are spectrally similar to fulvic acid and HULIS sample spectra [14]. This, together with the fact that high values of f 44 in AMS spectra are indicative of acid groups [109, 110], suggests that the highly oxidized LV-OOA is likely to contain polyacidic or acid-derived moieties.

Fig. 11
figure 11

f 44 vs. f 43 in the mass spectra of ambient OOA and various HULIS and fulvic acid samples. The dotted lines define the triangular space where ambient OOA components fall. The O/C ratios were estimated using the formula given in Aiken et al. [31]: \( {\text{O}}/{\text{C}} = 0.0{382} \times {f_{{{44}}}} + 0.0{794} \) (Adapted from Fig. 4 in Ref. [14], reprinted with permission)

A key implication of Fig. 11 is that photochemically aged SOA converges towards the same highly oxidized endpoint regardless of its original source. The common features of the OOA evolution shown in Fig. 11 potentially enable a simplified description of the oxidation of OA in the atmosphere. In fact, Jimenez et al. [13] recently used the PMF results from the datasets shown in Fig. 9 together with laboratory data to propose a two-dimensional modeling framework to map the evolution of OA. The two axes are volatility and O/C ratio. Because both of these axes can be experimentally obtained, this 2D model provides a useful framework for describing OA evolution that can be constrained by measurements [111]. O/C ratios of OA can be obtained, for example, directly from HR-ToF-AMS measurements [31] and OA factor volatilities can be measured with thermal denuders [57, 60]. Recent thermal denuder measurements indicate that ambient POA is more volatile than assumed in current models whereas ambient SOA becomes increasingly less volatile with photochemical aging, and is much less volatile than the SOA produced in smog chambers [5860].

Recent studies have also investigated the climate-relevant properties of ambient OA factors such as their cloud condensation nuclei (CCN) activity [112116] and hygroscopicity [13, 59, 117]. A consistent picture is emerging from the different measurements, in which fresh HOA seems to be non-hygroscopic and more oxygenated species take up more water as their O/C increases. In ship-based measurements, for example, Quinn et al. [114] showed that the variation in the critical diameter for CCN activation of marine, urban, and industrial ambient aerosol CCN activity at a supersaturation of 0.44% was correlated with the HOA mass fraction in particles with d va < 200 nm. Raatikainen et al. [59] have also shown that the hygroscopicity of ambient OA factors increases with increasing oxidation, which is consistent with laboratory and field measurements that show a positive correlation between hygroscopicity and overall O/C ratio [13, 117, 118].

Use of factor analysis results in regional and global modeling

Despite recent advances, better descriptions of the properties and evolution of both primary and secondary OA are needed in regional and global models [7, 10, 11]. For instance, the emission inventories and the evaporation upon dilution of POA are not well constrained [119]. Explicit modeling of all important SOA chemical reactions and species are computationally prohibitive for large-scale models, and simpler modeling based on laboratory experiments often does not reproduce the observed concentration or dynamic evolution of SOA in polluted regions [42] while performing better over clean biogenic regions [67]. Thus, there is a continuing need for field data that can be used to accurately constrain and test model predictions.

In this context, the reduced complexity in OA factor analysis results is particularly useful for development and evaluation of air quality and climate models. OA factors extracted from factor analysis of highly time-resolved aerosol mass spectra correspond to lumped groups of molecules that are linked to each other by similar sources or processes. Thus, the observed mass concentrations and bulk composition (e.g. O/C) of these factors provide useful information for constraining theoretical predictions of the spatial and temporal evolution of key OA sources. For example, the factor analysis results of AMS data acquired from multiple stationary and aircraft platforms during the MILAGRO 2006 experiment in Mexico City have been used to evaluate box and regional scale models [120123]. The concentrations of POA components (= HOA + BBOA) measured during the MILAGRO experiment have been used to constrain and evaluate the accuracy of the Mexico City emission inventories for primary sources [123] and several recent SOA modeling approaches have been tested against measured mass concentrations of OOA [120, 121, 124]. As an example, Fig. 12 shows a comparison of 3D model results using three different SOA modeling approaches and measurements obtained in MILAGRO [120]. The “REF” model reports the POA and SOA expected from primary emission inventories assuming that the primary emissions can be modeled as completely non-volatile and that SOA is dominated by aromatic precursors. The “ROB” model (based on Ref. [119]) treats primary emissions as semi-volatile species (SVOCs) that can evaporate from the particle phase after emissions and then react to form less volatile SOA. This model also includes intermediate volatility gas phase species (IVOCs), which are not typically measured or included in standard models, as precursors for measured SOA. The “GRI” model (based on Ref. [125]) has a similar structure to the “ROB” model, but with updated terms. Figure 12 indicates that the models that include SVOCs and IVOCs are generally in better agreement with measured factor concentrations. However, the fact that neither the “ROB” or “GRI” predictions fully captures the observations indicates the need for improved models.

Fig. 12
figure 12

Average diurnal profiles of the model simulations (red, blue, and green lines) and the AMS measurements (black dots) at the (a) T0 (urban center site) and (b) T1 (downwind suburban site) for the MILAGRO field experiment in Mexico City for the concentrations of HOA, BBOA, OOA, and total OA. The REF model case represents POA as non-volatile and only accounts for SOA formation from aromatic species. The ROB (red line) and GRI (green line) cases consider POA as semi-volatile and also account for SOA formation from semivolatile and intermediate volatility species (see text). The variability associated with average observations and ROB model predictions is given in the gray shaded area and red vertical bars, respectively. (Adapted from Ref. [120], reprinted with permission)

The worldwide factor analysis results from the AMS data shown in Fig. 9 were recently used to constrain SOA sources using a global chemical transport model [126]. In this case the observed OOA concentrations were used to optimize the SOA source strengths from lumped biogenic, anthropogenic, and biomass burning sources. IMPROVE network measurements were then used to independently verify the optimized SOA predictions. The optimized model yields a global SOA source of 140 Tg SOA a−1. Of this source, 7% (10 Tg a−1) is estimated to be anthropogenic SOA from urban and/or industrial VOCs, 64% (90 Tg a−1) is estimated to be anthropogenically controlled SOA formed primarily from biogenic VOCs, and 9% (13 Tg a−1) is estimated to be SOA of biogenic origin without anthropogenic input. SOA from biomass burning VOCs account for 3% (3 Tg a−1) of the global SOA source, and oxidation of POA (mostly BBOA) accounts for 16% (23 Tg a−1) of the SOA source. The scatter plots of measured vs. modeled OOA concentrations using the initial (biogenic SOA-dominated) and final optimized models are shown in Fig. 13. Most of the field measurements included in this study were obtained in the northern hemisphere. Thus, the usefulness of this approach for constraining global SOA sources will be further enhanced when more factor analysis results from the southern hemisphere become available.

Fig. 13
figure 13

Scatterplot of simulated (GLOMAP) versus observed (AMS; data shown in Fig. 9) OOA data using (a) the initial biogenic dominated model and (b) the final optimized model. Observation locations are classified as urban (black circles), urban-downwind (blue triangles), and rural and/or remote (red squares) as in Zhang et al. [6]. The 1:1 line (solid), 2:1 lines (dashed), and 10:1 lines (dotted) are shown. (Figs. 3d and 4d, respectively, in Ref. [126]. Copyright 2011, reprinted with permission)

In addition, the OA factor results from six AMS field measurement campaigns were used by Ervens et al. [115] to evaluate the extent to which simple assumptions of OA composition and mixing states can reproduce measured CCN number concentrations in ambient air. These authors concluded that a simple treatment of CCN composition and/or mixing state as a function of distance from sources predicts CCN concentration and cloud drop concentration with reasonably good accuracy (~15%). This finding may have important implications on large-scale modeling of aerosol–cloud interactions. Furthermore, Wex et al. [127] reported the utilization of the global OA factor results shown in Fig. 9 for evaluating global-scale CCN predictions.

Conclusions and future research

The development in recent years of quantitative aerosol mass spectrometers capable of reporting highly time-resolved organic aerosol composition data has enabled a new application of factor analysis techniques. The most useful techniques are based on mass conservation and enable estimation of the time series and mass spectra of OA factors, representing the contributions of hundreds of different chemical species, and which can be associated with different sources and/or processes. Aerodyne AMS instruments have so far dominated this field, with only three publications using data from other real-time aerosol mass spectrometers. PMF is a positively constrained, error-weighed variant of the bilinear model and is most commonly used in factor-analysis applications. The criteria for selecting the optimum PMF solutions are complex, and have been summarized here. The uncertainty of the PMF factors may be evaluated on the basis of variations in plausible solutions obtained with different FPEAK and SEED values or via bootstrapping analysis with replacement of mass spectra. Some advanced applications have been reported using, e.g., a unified dataset combining aerosol mass spectrometry and gas-phase VOC data, additional constraints on the mass spectra, or three-dimensional models. The components most commonly identified in the data include several primary emissions factors, for example hydrocarbon-like OA, biomass-burning OA, and cooking OA, and several secondary process-related factors under the umbrella of oxygenated organic aerosols (OOAs). Fresh OOAs are more variable whereas aged OOAs seem to converge into non-volatile and highly oxygenated OAs which are likely to be acid-dominated. The results from these analyses are being used to evaluate multiple models including both aerosol process and regional and global chemical-transport models.

Some methodological aspects require further research, for example improvement of error estimates, better methods for determining proper relative weights when mixing datasets, and methods to account for the variability in component mass spectra. Factor analysis of datasets acquired with improved soft ionization aerosol mass spectrometers, which retain information of individual organic molecules and larger fragments, and of multidimensional datasets (e.g., acquired from thermodenuder–AMS and fast chromatography–MS) might enable further resolution of some of the current uncertainties associated with OA sources and processes.

Finally, more efforts should be placed on integrated analysis and interpretation of the factor analysis results of worldwide aerosol mass spectrometry datasets acquired across very wide geographic locations and on different time scales. These results enable the development of holistic and simplified views on OA climatology necessary for constraining global and regional models and for advancing current knowledge of the roles of aerosols in global climate change and degradation of human health and ecosystems.