Skip to main content
Log in

Determining the number of factors for non-negative matrix and its application in source apportionment of air pollution in Singapore

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

The non-negative matrix factorization has been used in many disciplines of research, where the number of factors plays a crucial role. However, a fully data-driven method for determining the number is yet not available in the literature. Based on the fact that the most appropriate number of factors should generate the best prediction, in this paper we propose a selection method using a two-step delete-one-out approach, called twice cross-validation. This method is easy to implement and is fully data-driven. It also works when constraints are imposed on the factorization including the sparsity. Intensive simulations and real data analyses suggest that the proposed method performs well in most cases and can select the number of factors correctly when the number of factors is much less than the dimension of variables and the sample size is reasonably large. As an important application, the proposed method is used for source apportionment of air pollution in Singapore, and provides physically reasonable source profiles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Al-Thani H, Koc M, Isaifan RJ (2018) Investigations on deposited dust fallout in Urban Doha: characterization, source apportionment and mitigation. Environ Ecol Res 6:1493–506

    Google Scholar 

  • Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221

    Article  Google Scholar 

  • Bartoletti S, Loperfido N (2010) Modelling air pollution data by the skew-normal distribution. Stoch Environ Res Risk Assess 24:513–517

    Article  Google Scholar 

  • Bayraktar H, Turalioǧlu FS, Tuncel G (2010) Average mass concentrations of TSP, PM10 and PM2. 5 in Erzurum urban atmosphere, Turkey. Stoch Environ Res Risk Assess 24:57–65

    Article  Google Scholar 

  • Belis CA et al (2014) European guide on with receptor models air pollution. JRC reference report, European Commission

  • Beuck H, Quass U, Klemm O, Kuhlbusch TAJ (2011) Assessment of sea salt and mineral dust contributions to PM10 in NW Germany usingtracer models and positive matrix factorization. Atmos Environ 45:5813–5821

    Article  CAS  Google Scholar 

  • Bro R, Kjeldahl K, Smilde AK, Kiers HAL (2008) Cross-validation of component models: a critical look at current methods. Anal Bioanal Chem 390:1241–1251

    Article  CAS  Google Scholar 

  • Brown S, Hafner H (2005) Multivariate receptor modeling workbook. USEPA, Research Triangle Park

    Google Scholar 

  • Brunet J, Tamayo P, Golub T, Mesirov J (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci 101:4164–4169

    Article  CAS  Google Scholar 

  • Buzcu B, Fraser MP, Kulkarni P, Chellam S (2003) Source identification and apportionment of fine particulate matter in Houston, TX, using positive matrix factorization. Environ Eng Sci 20:533–545

    Article  CAS  Google Scholar 

  • Cabada JC, Pandis SN, Robinson AL (2002) Sources of atmospheric carbonaceous particulate matter in Pittsburgh, Pennsylvania. J Air Waste Manag Assoc 52:732–741

    Article  Google Scholar 

  • Chan YC, Hawas O, Hawker D, Vowles P, Cohen DD, Stelcer E et al (2011) Using multiple type composition data and wind data in PMF analysis to apportion and locate sources of air pollutants. Atmos Environ 2:439–449

    Article  CAS  Google Scholar 

  • Fassò A (2013) Statistical assessment of air quality interventions. Stoch Environ Res Risk Assess 27:1651–1660

    Article  Google Scholar 

  • Hien P, Bac V, Thinh N (2004) PMF receptor modelling of fine and coarse PM 10 in air masses governing monsoon conditions in Hanoi, northern Vietnam. Atmos Environ 38:189–201

    Article  CAS  Google Scholar 

  • Ho WY, Tseng KH, Liou ML, Chan CC, Wang CH (2018) Application of positive matrix factorization in the identification of the sources of PM2.5 in Taipei City. Int J Environ Res Public Health 15:1305

    Article  CAS  Google Scholar 

  • Hopke P (2000) A guide to positive matrix factorization. In: Workshop on UNMIX and PMF as applied to PM2, vol 5, p 600

  • Kim E, Hopke P (2004) Improving source identification of fine particles in a rural northeastern U.S. area utilizing temperature-resolved carbon fractions. J Geophys Res Atmos 109:729–736

    Google Scholar 

  • Kim E, Hopke PK, Edgerton ES (2003) Source identification of Atlanta aerosol by positive matrix factorization. J Air Waste Manag Assoc 53:731–739

    Article  CAS  Google Scholar 

  • Lanz VA, Alfarra MR, Baltensperger U, Buchmann B, Hueglin C, Prevot ASH (2007) Source apportionment of submicron organic aerosols at an urban site by factor analytical modelling of aerosol mass spectra. Atmos Chem Phys 7:1503–1522

    Article  CAS  Google Scholar 

  • Larsen RK, Baker JE (2003) Source apportionment of polycyclic aromatic hydrocarbons in the urban atmosphere: a comparison of three methods. Environ Sci Technol 37:1873–1881

    Article  CAS  Google Scholar 

  • Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    Article  CAS  Google Scholar 

  • Lee E, Chan C, Paatero P (1999) Application of positive matrix factorization in source apportionment of particulate pollutants in Hong Kong. Atmos Environ 33:3201–3212

    Article  CAS  Google Scholar 

  • Li H, Li Q, Shi Y (2017) Determining the number of factors when the number of factors can increase with sample size. J Econom 197:76–86

    Article  Google Scholar 

  • Liu W, Hopke P, Han Y, Yi S, Holsen T, Cybart S, Kozlowski K, Milligan M (2003) Application of receptor modeling to atmospheric constituents at Potsdam and Stockton, NY. Atmos Environ 37:4997–5007

    Article  CAS  Google Scholar 

  • Muñoz E, Martin ML, Turias IJ, Jimenez-Come MJ, Trujillo FJ (2014) Prediction of PM10 and SO\(_2\) exceedances to control air pollution in the Bay of Algeciras, Spain. Stoch Environ Res Risk Assess 28:1409–1420

    Article  Google Scholar 

  • Murillo JH, Roman SR, Marin JFR, Ramos AC, Jimenez SB, Gonzalez BC, Baumgardner DG (2013) Chemical characterization and source apportionment of PM10 and PM2.5 in the metropolitan area of Costa Rica, Central America. Atmos Pollut Res 4:181–190

    Article  CAS  Google Scholar 

  • Nieto PG, Lasheras FS, García-Gonzalo E, de Cos Juez FJ (2018) Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques. Stoch Environ Res Risk Assess 32(11):3287–3298

    Article  Google Scholar 

  • Norris G, Vedantham R, Wade K, Zahn P, Brown S, Paatero P, Martin L (2009) Guidance document for PMF applications with the multilinear engine. Prepared for the US Environmental Protection Agency, Research Triangle Park, NC, by the National Exposure Research Laboratory, Research Triangle Park, NC

  • Paatero P (2000) User’s guide for positive matrix factorization programs PMF2 and PMF3. University of Helsinki, Helsinki

    Google Scholar 

  • Paatero P, Hopke P (2009) Rotational tools for factor analytic models. J Chemom 23:91–100

    Article  Google Scholar 

  • Paatero P, Tapper U (1993) Analysis of different modes of factor analysis as least squares fit problems. Chemom Intell Lab Syst 18:183–194

    Article  CAS  Google Scholar 

  • Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126

    Article  Google Scholar 

  • Poirot R, Wishinski P, Hopke P, Polissar A (2001) Comparative application of multiple receptor methods to identify aerosol sources in northern Vermont. Environ Sci Technol 35:4622–4636

    Article  CAS  Google Scholar 

  • Pósfai M, Anderson JR, Buseck PR, Sievering H (1995) Compositional variations of sea-salt-mode aerosol particles from the North Atlantic. J Geophys Res Atmos 100:23063–23074

    Article  Google Scholar 

  • Radonić J, Gavanski NJ, Ilić M, Popov S, Očovaj SB, Miloradov MV, Sekulić MT (2017) Emission sources and health risk assessment of polycyclic aromatic hydrocarbons in ambient air during heating and non-heating periods in the city of Novi Sad, Serbia. Stoch Environ Res Risk Assess 31:2201–2213

    Article  Google Scholar 

  • Ramadan Z, Song X, Hopke P (2000) Identification of sources of Phoenix aerosol by positive matrix factorization. J Air Waste Manag Assoc 50:1308–1320

    Article  CAS  Google Scholar 

  • Reff A, Eberly S, Bhave P (2007) Identification of sources of Phoenix aerosol by positive matrix factorization. J Air Waste Manag Asso 57:146–154

    Article  CAS  Google Scholar 

  • Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–494

    Article  Google Scholar 

  • Song Y, Zhang Y, Xie S, Zeng Li, Zheng M, Salmon L, Shao M, Slanina J (2006) Source apportionment of PM2.5 in Beijing by positive matrix factorization. Atmos Environ 40:1526–1537

    Article  CAS  Google Scholar 

  • Tibshirani R, Taylor J (2012) Degrees of freedom in lasso problems. Ann Stat 40:1198–1232

    Article  Google Scholar 

  • Ulbrich IM, Canagaratna MR, Zhang Q, Worsnop DR, Jimenez JL (2009) Interpretation of organic components from positive matrix factorization of aerosol mass spectrometric data. Atmos Chem Phys 9:2891–2918

    Article  CAS  Google Scholar 

  • United States Environmental Protection Agency (2017) Positive matrix factorization model for environmental data analyses. https://www.epa.gov/air-research/positive-matrix-factorization-model-environmental-data-analyses

  • Wang H, Shooter D (2005) Source apportionment of fine and coarse atmospheric particles in Auckland, New Zealand. Sci Tot Environ 340:189–198

    Article  CAS  Google Scholar 

  • Wang X, Zong Z, Tian C, Chen Y, Luo C, Li J, Luo Y (2017) Combining positive matrix factorization and radiocarbon measurements for source apportionment of PM2.5 from a national background site in North China. Sci Rep 7:10648

    Article  CAS  Google Scholar 

  • Zekri H, Mokhtari AR, Cohen DR (2016) Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data: an environmental study in central Iran. Stoch Environ Res Risk Assess 30:1947–1960

    Article  Google Scholar 

  • Zeng X, Xia Y (2018) Selection of the number of factors in factor models. Manuscript, Department of Statistics and Applied Probability, National University of Singapore

  • Zhang L, Liu Y, Zhao F (2018) Singular value decomposition analysis of spatial relationships between monthly weather and air pollution index in China. Stoch Environ Res Risk Assess 32:733–748

    Article  Google Scholar 

  • Zong Z, Wang X, Tian C, Chen Y, Qu L, Ji L, Zhang G (2016) Source apportionment of PM2.5 at a regional background site in North China using PMF linked with radiocarbon analysis: insight into the contribution of biomass burning. Atmos Chem Phys 16:11249–11265

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We are most grateful to the AE and two referees for their valuable comments and constructive suggestions, which have led to a substantial improvement of this paper. YC Xia’s research is partially supported by MOE Tier 1 Grant: R-155-000-193-114, and MOE Grant of Singapore: MOE2014-T2-1-072, and National Natural Science Foundation of China, 11771066.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingcun Xia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, M., Yang, X., Hang, W. et al. Determining the number of factors for non-negative matrix and its application in source apportionment of air pollution in Singapore. Stoch Environ Res Risk Assess 33, 1175–1186 (2019). https://doi.org/10.1007/s00477-019-01677-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-019-01677-z

Keywords

Navigation