Contribution of point and small-scaled sources to the PM10 emission using positive matrix factorization model

Background The positive matrix factorization is a powerful environmental analysis technique which has been successfully utilized to assess air-born particulate matter source contribution. The new version of this model (PMF5) has two additional estimation error methods and some other useful advantages compared to the previous versions. In the present study, the capability of PMF5 for identification and contribution of small size particle source to the ambient particulate matter was evaluated. Methods The study area is surrounded by three industrial complexes and 2 locations of dumped tailing soils of mining activities and related manufactures. Ambient particulate matter were sampled at 2 sites in the urban area of Zanjan (Iran) and 196 collected samples were analyzed for 15 chemical elements. Results At downtown, the identified factors (and their contributions to particulate matter) were: soil particles (40.36%), fuel combustion and traffic (26.8%), tailing soils (lead and zinc) (21.32%), and nickel and industrial emission(5.7%). The identified factors at residential site of studied area (and their contributions to particulate matter) were general industrial emission (28.2%), tailing soils (lead and zinc) (39.2%), soil (25.8%), cadmium and general pollutants (6.7%). Conclusion The results of modeled data by PMF 5 indicated that the applied model could identify the dumps of tailing soils as a separated factor. The other particulate matter sources in the studied area were traffic, fuel combustion, soil particles and industrial pollutants.


Background
The correlation between high concentrations of airborne particulate matter (PM) and morbidity and/or mortality has been shown in many studies [1]. Air-born particles with aerodynamic diameter less than 10 μm (PM 10 ) can penetrate into the lungs and enter toxic chemicals into human body [2]. Construction of industrial centers near the cities, traffic, transportation, mining, agricultural and construction activities are the main anthropogenic sources of air-born particles in the air. The size and composition of the particles determine the degree of penetration into the lungs and harmful effects on the human health [3]. These parameters depend on the sources of the particles, therefore, source identification and apportionment of the air-born particles are tow basic measures in the urban air quality management systems. Particulate matters contain organic and inorganic chemicals. Among inorganic compounds, heavy metals are the most important ones owing to their harmful effects on the environment and human health.
For that reason, many source apportionment methods have been applied based on the statistical evaluation of data, emission inventories or dispersion models, and evaluation of monitoring data. Chemical mass balance (CMB), factor analysis, principle component analysis (PCA), multiple linear regression methods and positive matrix factorization (PMF) have been used by researchers in many studies [4]. PMF is a widely used multivariate method which can find the main sources of particles without prior knowledge of the sources. The model fundamentally resolves the identities and contributions of components in an unknown matrix. The most important advantage of this model is that it has potential to incorporate variable uncertainties associated with environmental sample measurements [5].
Mining activities such as concentrated soil preparation for material (metal) extraction and metal melting are accompanied by the production of great amounts of tailing soils. These soils contain fine particles which can readily re-suspend into the atmosphere by wind.
These particles contain a high percentage of metals especially heavy metals such as cadmium,lead, and chromium whose distribution on the ground can pose serious threats to human public health.
Iran produces three percent of the total world reserves of lead and zinc and is the fourth producer of lead and zinc concentrated soil in Asia following china, Kazakhstan and India. Asia produces about %45 of the world lead., Angooran which is the largest zinc and lead mine in Iran is located in Zanjan province [20]. Due to its proximity to Zanjan, more than 100 factories of concentrated soil and ingots of lead and zinc have been established around Zanjan. These factories use the raw material from Angooran mine, and even in recent years, industrialists are importing raw materials from other provinces. This development of zinc and lead industries in a small area results in the production of million tons of tailing soils which are accumulated around the factories on the open grounds without any environmental considerations. These soils are produced from filtering of acid leached concentrated soils with very fine particles which are named filter cakes. These particles can emit and re-suspend into the atmosphere easily by wind and move towards the city center and residential zones and threat the public health. Emission of these particles and their settlement on the agricultural grounds can cause soil and groundwater pollutions and enter into food chain.
In addition to zinc and lead industries, there are several small and large scale industrial areas. The city is surrounded by three industrial complexes (Industrial Complex No. 1 in the North-west, Industrial Complex No. 2 in the southwest, and Zinc Industrial Complex in the south-west). The largest lead production factory in Iran (National Iranian Lead and Zinc Company) is also located in the East of Zanjan. In these industrial complexes, more than 200 manufacturers are active and release large amounts of air pollutants into the atmosphere. In recent years, the contribution quantity of open dumped tailing soils to air born PM in Zanjan is considered as the main concern for environmental and governmental organizations.
Most of the published studies have focused on largescaled and non-point particle sources such as combustion, traffic, soil and industrial activities. However, limited research has been done on the contribution of small-scaled and point sources in particle emission. The aim of present study was an attempt to report the source identification and apportionment of emitted particles with an emphasis on the unburied tailing soils as a small-scaled and point source near an urban area using EPA-PMF 5. Since chemical characterization of these soils is very similar to that of crustal soil, the capability of EPA-PMF 5 to differentiate between these two similar particle sources is another objective of this study. Although the fifth version of EPA-PMF model was first introduced in 2014, there are few studies which have used this new version [21].
Compared to EPA-PMF 3, EPA-PMF 5 has two additional error estimation methods that are very useful in the determination of the number of factors [21]. In most of the studies in this field, there is a relatively small number of discussions about the process of factor number determination. In the present study, attempts have been made to determine the number of factors considering error estimation methods (displacement and bootstrapping) based upon the methods presented by Brown et al. [22].

Description of the study area
Zanjan (the capital of Zanjan province) is situated in the north-west of Iran (latitude 36 41 N longitude 48 27 E) at an average height above the sea level of 1620 m. This city had a population of about 400,000 in an area of 81 km 2 in 2015. The climate of Zanjan is cold semi-arid, with hot dry summers and cold moist winters. Mean annual air temperature is 10°C, mean annual rainfall is 295 mm and the prevailing wind is eastern with an average speed of 3 m/s. The surrounding areas of Zanjan are characterized by light to heavy industrial complexes. There are two dumps of tailing soil nearby. The first dump is located beside the Zinc Industrial complex with an area of around 1 Km 2 and contains more than three million tons of tailing soil in which about 100 zinc factories discharge their tailing soils. The second is situated beside the National Iranian Lead and Zinc Company with an area of less than 1 Km 2 . The location of industrial complexes and open dumps of tailing soils are shown in Fig. 1.

Data collection
A high-volume sampler (TCR-Tecora) was used to collect ambient PM 10 samples at the flow rate of 16.7 L/min. Daily 24-h PM 10 samples were collected weekly from July 2013 to July 2015 (totally192 samples). The sampling operation was conducted according to the USEPA-IO2.1 method (1999) [23]. Fiberquartz filters were used for sampling PM 10 . Then, the samples were immediately transformed to the laboratory for digestion operation. A microwave digester (sineo-model mds-10) was employed for digestion operation using microwave method of USEPA-IO-3.1 (1999) [24]. The digested samples were analyzed by inductivity coupled plasma method of USEPA-IO-3.4(1999) [25]. ICP-OES (Spectro) was utilized to analyze the traces of Ca, Ni, Pb, Zn, Al, As, Sb, V, Hg, Cd, Cr, Cu, Ti, Mn and, Fe in PM 10 samples. The operation parameters of ICP-OES are demonstrated in Table 1. The weight of PM 10 in the samples was measured using the gravimetric method.
In order to calculation of sample size in this research, 30 samples in each site were collected and after calculation of standard deviation of metals concentrations, the sample size was calculated using statistical formula.
Because low standard deviation of data, the achieved sample size was small. On the other hand, in order to achieve a valid solution with PMF model, the sample size should be at least 100 (19).
In this research 100 sample were collected in each study site. Four samples were removed because the local storm and modeling was carried out with 96 samples of each site and totally 192 samples.
Two monitoring sites were selected in the urban area. Site one is downtown where the traffic load is heavy. This zone is the center of commercial activities. It is near the south of Zanjan-Tabriz highway. The air pollutants which are originated from industrial complexes affect the quality of air in this site. Site two is the north-east of Zanjan. This zone is merely residential and the traffic load and commercial activities in this zone are low. The distance between these two sites is about four kilometers.

Data analysis method
Positive matrix factorization (EPA-PMF 5), a multivariate receptor based model, was used for source apportionment and characterization of the collected PM 10 [26]. A PMF model assumes that there are p factors (sources) which can be involved in a receptor site and can be stated with the following equation: Where, X ij Concentration of species J in ith sample g ik Contribution of kth factor to the ith sample f kj Fraction of kth factor that is species j or chemical composition profile of factor K E ij Residual for the jth species on the ith sample The contributions of factor (g ik ) and source profiles (f kj ) are estimated by the PMF model by minimizing the objective function: u ij : Uncertainty of the jth species of the ith sample Q: objective function.
The main aim of EPA PMF is to minimize the sum of squares of standardized residuals or Q. In EPA PMF 5, two versions of Q are applied and displayed for the model runs.
1) Q true is the goodness-of-fit parameter calculated including all points. 2) Q robust is the goodness-of-fit parameter calculated excluding points not fit by the model which are the samples with uncertainty-scaled residual greater than 4. The difference between these two Qs is the degree of the impact of the data points with highscaled residuals [21].
Two input files of the data and uncertainty values were prepared according to the described method in the PMF5 manual. PM 10 concentrations were included in the data file (first input file) as the independent variable [11]. The uncertainty values were calculated as below and were included in the uncertainty file (second input file).
Determination of the uncertainty for each of the measured data is the pre-requisite for the application of PMF. In PMF, the weight of missing and belowdetection-limit data would decrease with appropriate uncertainty [27]. In order to determine the uncertainties in the data, the standard deviation of repeated analysis of standard reference materials was used and the detection limit (MDL) of each species was calculated. PMF has the ability to underweight the missing data and values below the detection limit, and can reduce the influence of extreme values using robust mode [1]. The uncertainties of the species were determined according to the recommended methods and equations in the EPA-PMF5 manual [21].
The data with concentrations below MDL, substituted by 1/2 MDL and 5/6 MDL was used as the corresponding uncertainty value [5].
If the concentration was greater than MDL, the following equation was used [21].
As described by Norris et al. [21] and detailed in Paatero et al. [28], EPA PMF has 2 main error estimation methods: displacement (DISP), Bootstrapping (BS), as well as a useful tool for rotation that is named F peak [21,28].
DISP includes the effects of rotational ambiguity and does not affect random errors in the data. BS includes the effects of random errors and partially-rotational ambiguity.
In this study, the number of factors was determined on the basis of variations in values of Q true and Q robust and IQ expected . The identified sources of the trace metals in PM 10 were interpreted physically based on the field information and wind directions.
The S/N calculation in PMF 5 has been revised which is described in the EPA PMF 5 user guide in details. In order to reduce the weights of the species with low S/N in the solution, the species with S/N ratio less than 1 were categorized as weak variables [21]. Most of the species have S/N higher than 8, the reason for this high S/N is that the species were analyzed in PM 10 , hence, most of the concentrations were high.   Table 2 shows the correlation between the daily spaces concentrations of two sites. The correlation coefficient values(r) shows that there is very weak correlation between spaces concentrations in two sites. It means that the patterns off particle emission and the origins of metals in two sites are different. It also shows that the locations of studied sites are suitable.

Results and discussion
Without mathematical tools or models, interpretation of the species data, because of high variation in metals and their different concentrations in different days and seasons is impossible.
Tables 3 and 4 list the species quantified along with their max., min., median and percentiles of the species concentrations. The species with concentrations lower than MDL in most of the samples and/or S/N ratios lower than two were categorized as Bad. PM 10 and As were categorized as weak in the modeling process of site1. In order to model the collected data of site 2, the PM 10 mass and V were categorized as weak, and As and Hg were categorized as Bad. Modeling operations of the data by PMF were started with 100 runs and the run with Q Min was retained for four-five-and-six factor solutions and the other estimation methods. The BS was run with 100 bootstraps, 3 block size and minimum correlation R-value of 0.6. The results demonstrated that most of the species were well modeled. Figure 2 shows the pie charts of factor contribution to PM 10  At five-factor solution, an additional factor with key space of Cd was separated from the soil and tailing soil factors. At all solutions with four-five-and-six factors, all key species were well predicated. It means that their r 2 of the observed/ predicated scatter plots were greater than 0.8 [22].
The achieved values of Q/Q expected were 1.15, 0.63 and 0.49 at four-five-and-six factor solutions respectively. The decrease of Q/Q expected in moving from four   to five factors was higher than the value in moving from five to six factors. It indicates that there may be too many factors being fit, therefore, five factors may be the optimal solution [22]. In BS runs with four-factor solution, factors were mapped in 96%, 99%, 95%, and 100% of the runs and in DISP run there were no swaps in all of the dQ max levels and the largest observed drop of Q during DISP was 0.1%. At five factors solution, factor four was mapped in 96% of the runs and, the others were mapped in 100% of the runs and in DISP run there were no swaps in the first two dQ max levels (4 and 8) and the largest observed drop of Q during DISP was 0.09%.
In six-factor solution, the factors had larger swaps in DISP run and BS factors were not mapped with base factors. These results showed that there were a significant rotation ambiguity and a random error in this solution.
In this present study, the error estimation interval ratios of the key species of each factor as presented by Paatero et al. [28] were used to compare the results. The DISP and BS intervals ratios at four-and five-factor solutions were almost equal to each other and quite lower than six-factor solution. These values indicate that there are little rotational ambiguity and low estimation error in this solutions. Higher interval ratios indicate higher uncertainty for key species [22].
With regards to the results of BS mapping, DISP swaps and interval ratios of three solutions, it can be concluded that five-factor solution is the best and more stable than the other solutions.

Site 2
Site 2 is located in a residential zone in the north-east of Zanjan. The results of the modeled data of site 2 are demonstrated in Figs. 2 and 4.
The PMF model was run with four-five-and-six factor solutions and the BS and DISP were also run for the estimation of the errors in each solution with the parameters similar to those of the site1 data modeling.
At four-factor solutions, the identified factors (and their contributions to PM 10 ) are as follows: coefficient in the observed/predicted scatter plots for key species of the identified factors were higher than 0.8.
The achieved values of the Q/Q expected at different factor number solutions were 1.38, 0.98 and it was found to be 0.7 at four-five-and-six factor solutions respectively. The decrease in Q/Q expected when moving from four-factor solution to five-factor solution was significantly more than when moving from five to six factors. In order to interpret the number of the factors more accurately, it is necessary to analyze the solutions using DISP and BS methods. At four-and-five factor solutions, there were not any swaps in DISP runs in dQ max = 4. The values of dQ% in both solutions were lower than 1%. In four-factor solution, all factors were mapped in more than 80% of the runs and in five-factor solution, factor 3 was mapped in 41% of the runs. These results showed that there is a significant random error at fivefactor solution in this sampling site. The results of the BS and DISP at 6 factors solution showed both random error and rotational ambiguity in this solution. The calculated BS and DISP interval ratios of the key species in 95% of the cases at four-factor solution were lower than that at five-and six-factor solutions.
These results indicate that the uncertainty of the key species could increase the factors. These results also indicated that four-factor solution is the most interpretable solution as discussed above due to the lowest random errors and rotational ambiguity.

Rotation solution
In modeling with EPA-PMF, a useful method of rotating solution is F peaks tool. In this method, the rows and columns of F and Q matrices will be added and/or subtracted from each other at different F peaks, and then the %dQ will be calculated by PMF program [5].
In the present study, the F peak was run for the best solutions in both sites in order to introduce the rotations to the solutions. The F peak strengths values were adjusted between 1 and-1 and the %dQ was examined as a function of F peak. The results showed that the nonrotated solutions (F peak =0) were judged most interpretable with corresponding Q values.

Identification of the sources
In this study, four common sources were identified in both sites. The identified sources were almost similar in both sites, but their contributions to the particulate matter (PM 10 ) were different. The order of factors in site 1 is different from the order in site 2. In site 1 re-suspended surface soil is in the first order but in site 2, tailing soils of Pb and Zn is the first. These differences are reasonable considering the low values of correlation coefficients. The characteristics of the identified sources are briefly presented below.

Tailing soil (Pb, Zn) source
As mentioned in the introduction, Zanjan is the center of zinc and lead production in Iran. The activity of the related industries is associated with tailing soil production  which is dumped unpaved and without any environmental considerations. The two main locations of tailing soils are nearby Zanjan. One of them, with about 3000000 million tons of tailing soils, is located in the southeast with a distance of 5 km from the city beside the complex of zinc factories (zinc industrial complexes). The second is located beside the National Iranian Lead and Zinc Company in the east of zanjan, Zanjan-Tehran road located 12 kms away from Zanjan. These locations are shown in Fig. 1. Re-suspension of these tailing soils by wind must have contributed to this factor.
The tailing soil factor is dominated by Pb, Zn, and Fe. In order to determin e the concentration of the trace metals in tailing soils, 20 samples were collected and analyzed using the same analytical method for PM 10 samples. The presented results in Fig. 5 show that Pb, Zn, Ca and Fe are enriched in the dumped tailing soils. The percentages and concentrations of these species in the identified factor of tailing soils were significantly high too. The contributions of the tailing soils factor in the PM 10 of sites 1 and 2 were 21.32% and 39.2% respectively. The main reason is the Fig. 6 The wind rose of Zanjan in the 8 sampling seasons location of tailing soil depot of National Iranian Lead and Zinc Company at east and the direction of prevailing winds.

Fuel combustion/traffic
The other identified source was fuel combustion/traffic. This source is characterized by high concentrations of Fe, Ni, V, Cu, Zn and Ca [8,11,12]. In the present study, the fuel combustion factor contained significant loadings of the above-mentioned metals. The emission from gasoline vehicles and diesel and oil combustion in stationary and moved sources must have contributed to this source. In the studied sites, the separated sources of traffic and industrial fuel combustion sources were not identified. In Iran, lead-free gasoline is used, therefore, the concentration of lead in this factor was low. It has been found that Pb, Zn, and Cu are indicators of traffic emission [11]. Cu, Fe, and Zn are emitted from brake-wear, and vehicle tires are the main source of Zn in road traffics [8].
The contribution of fuel combustion/traffic to the PM 10 at site 1 was about %26.8. This contribution was ordinary due to the high traffic load in this site. In the residential area (site 2), general pollutants had a contribution of about 6.7%.

Soil or crustal source
One of the identified sources was represented by Al, Ca, Fe, Zn, and Mn which could typically be soil sources. Re-suspension of the particles from barren soils and arid lands is the major origin of these elements in the PM 10 . The dearth of water in recent years has intensified the resuspension and dispersion the particles in the city. Unpaved roads and construction sites contribute to this factor and also produce particles carrying these crustal elements.

Industrial emission
The factor, identified as industrial source, includes several elements such as Fe, Mn, Al, Cr, Ni, Hg, and Zn. As mentioned before, Zanjan is surrounded by three main industrial complexes and zinc and lead factories which are regarded as the major industrial centers around the city.
Non-ferrous metallurgy industries such as coppersmelting, lead recycling from used batteries, and electrical industries are the other sources of air pollution in the studied areas.
The contribution of the industrial factor to PM 10 in the samples collected from sites 1 and 2 were 5.7% and 28.2% respectively. The contribution of Ni to the factor of industrial emission was higher than that of the other metals at site 1. There are tens of plating units around the city center of Zanjan where Ni, Cr, and other plating metals are used and can be the source of Ni in the air.

The wind directions
The wind rose of Zanjan in the seasons of sampling period is presented in Fig. 6. This figure shows that the prevailing winds were at first, E to W and then SE to NW. The contribution of tailing soil to particle emission in site 2 is higher than in site 2. It means that the particulate matter which is re-suspended from tailing soil dumps beside the National Iranian Lead and Zinc Company at east moves towards the city by wind. In addition, general industrial pollutants are spread in the atmosphere of the city.

Conclusion
The results of the present study showed that the identified tailing soil factor is one of the major sources of PM 10 in Zanjan with a contribution of 21.32% downtown and 39.2% in the residential areas. The two described dumps of the tailing soil whose chemical structures were relatively similar to those of soil particle source were limited to two points in a small area. The EPA PMF5 could successfully identify and apportion this source using a small sample size of 96 samples in each site. The identified factors and their quantities proved to be logical given wind direction.