Skip to main content
Log in

An open-source software package for multivariate modeling and clustering: applications to air quality management

  • Research Article
  • Published:
Environmental Science and Pollution Research Aims and scope Submit manuscript

Abstract

This paper presents an open-source software package, rSCA, which is developed based upon a stepwise cluster analysis method and serves as a statistical tool for modeling the relationships between multiple dependent and independent variables. The rSCA package is efficient in dealing with both continuous and discrete variables, as well as nonlinear relationships between the variables. It divides the sample sets of dependent variables into different subsets (or subclusters) through a series of cutting and merging operations based upon the theory of multivariate analysis of variance (MANOVA). The modeling results are given by a cluster tree, which includes both intermediate and leaf subclusters as well as the flow paths from the root of the tree to each leaf subcluster specified by a series of cutting and merging actions. The rSCA package is a handy and easy-to-use tool and is freely available at http://cran.r-project.org/package=rSCA. By applying the developed package to air quality management in an urban environment, we demonstrate its effectiveness in dealing with the complicated relationships among multiple variables in real-world problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Amari S-I, Murata N, Muller K-R, Finke M, Yang HH (1997) Asymptotic statistical theory of overtraining and cross-validation. IEEE Trans Neural Netw 8(5):985–996

    Article  CAS  Google Scholar 

  • Bondarenko I, Van Malderen H, Treiger B, Van Espen P, Van Grieken R (1994) Hierarchical cluster analysis with stopping rules built on Akaike’s information criterion for aerosol particle classification based on electron probe X-ray microanalysis. Chemom Intell Lab Syst 22(1):87–95

    Article  CAS  Google Scholar 

  • Cardinale BJ, Duffy JE, Gonzalez A, Hooper DU, Perrings C, Venail P, Narwani A, Mace GM, Tilman D, Wardle DA, Kinzig AP, Daily GC, Loreau M, Grace JB, Larigauderie A, Srivastava DS, Naeem S (2012) Biodiversity loss and its impact on humanity. Nature 486(7401):59–67

    Article  CAS  Google Scholar 

  • Clarkea K, Romainb AC, Locogea N, Redona N (2012) Application of chemical mass balance methodology to identify the different sources responsible for the olfactory annoyance at a receptor-site. Chem Eng. 30

  • Clemmensen L, Hastie T, Witten D, Ersbøll B (2011) Sparse discriminant analysis. Technometrics 53(4)

  • Cooley WW, Lohnes PR (1971) Multivariate data analysis. J. Wiley

  • Cooper GF (1990) The computational complexity of probabilistic inference using Bayesian belief networks. Artif Intell 42(2):393–405

    Article  Google Scholar 

  • de Vente J, Poesen J, Verstraeten G, Govers G, Vanmaercke M, Van Rompaey A, Arabkhedri M, Boix-Fayos C (2013) Predicting soil erosion and sediment yield at regional scales: where do we stand? Earth Sci Rev 127:16–29

    Article  Google Scholar 

  • DeFries RS, Rudel T, Uriarte M, Hansen M (2010) Deforestation driven by urban population growth and agricultural trade in the twenty-first century. Nat Geosci 3(3):178–181

    Article  CAS  Google Scholar 

  • Gardner M, Dorling S (2000) Statistical surface ozone models: an improved methodology to account for non-linear behaviour. Atmos Environ 34(1):21–34

    Article  CAS  Google Scholar 

  • He L, Huang G-H, Lu H-W, Zeng G-M (2008) Optimization of surfactant-enhanced aquifer remediation for a laboratory BTEX system under parameter uncertainty. Environ Sci Technol 42(6):2009–2014

    Article  CAS  Google Scholar 

  • Healey NC, Oberbauer SF, Ahrends HE, Dierick D, Welker JM, Leffler AJ, Hollister RD, Vargas SA, Tweedie CE (2014) A mobile instrumented sensor platform for long-term terrestrial ecosystem analysis: an example application in an arctic tundra ecosystem. J Environ Inform 24(1):1–10

    Article  Google Scholar 

  • Huang G (1992) A stepwise cluster analysis method for predicting air quality in an urban environment. Atmos Environ Part B 26(3):349–357

    Article  Google Scholar 

  • Huang G, Huang Y, Wang G, Xiao H (2006) Development of a forecasting system for supporting remediation design and process control based on NAPL‐biodegradation simulation and stepwise‐cluster analysis. Water Resour Res 42(6)

  • Huang G, Sun S (1988) Environmental quality reports of Xiamen Special Economic Zone. Xiamen Environmental Protection Bureau, Xiamen

    Google Scholar 

  • Huang Y, Wang G, Huang G, Xiao H, Chakma A (2008) IPCS: an integrated process control system for enhanced in-situ bioremediation. Environ Pollut 151(3):460–469

    Article  CAS  Google Scholar 

  • Hung YT, Wang LK, Shammas NK (2012) Handbook of environment and waste management: air and water pollution control. World Scientific

  • Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  • Jiao S, Zeng G-M, He L, Huang G-H, Lu H-W, Gao Q (2010) Prediction of dust fall concentrations in urban atmospheric environment through support vector regression. J Cent S Univ Technol 17:307–315

    Article  CAS  Google Scholar 

  • Jordan YC, Ghulam A, Chu ML (2014) Assessing the impacts of future urban development patterns and climate changes on total suspended sediment loading in surface waters using geoinformatics. J Environ Inform 24(2):65–79

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI, pp. 1137–1145

  • Liu Y, Wang Y (1979) Application of stepwise cluster analysis in medical research. Sci Sinica 22(9):1082–1094

    CAS  Google Scholar 

  • Ma ZZ, Wang ZJ, Xia T, Gippel CJ, Speed R (2014) Hydrograph-based hydrologic alteration assessment and its application to the yellow river. J Environ Inform 23(1):1–13

    Article  Google Scholar 

  • Marcot BG, Holthausen RS, Raphael MG, Rowland MM, Wisdom MJ (2001) Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement. For Ecol Manag 153(1):29–42

    Article  Google Scholar 

  • Markou MT, Bartzokas A, Kambezidis HD (2009) Daylight climatology in Athens, Greece: types of diurnal variation of illuminance levels. Int J Climatol 29(14):2137–2145

    Article  Google Scholar 

  • Mellino S, Buonocore E, Ulgiati S (2015) The worth of land use: a GIS-emergy evaluation of natural and human-made capital. Sci Total Environ 506:137–148

    Article  Google Scholar 

  • Morrison DF (1967) Multivariate statistical methods. McGraw-Hill Book

  • Overall JE, Klett CJ (1983) Applied multivariate analysis. RE Krieger Publishing Company

  • Park Y-C, Jeong J-M, Eom S-I, Jeong U-P (2011) Optimal management design of a pump and treat system at the industrial complex in Wonju, Korea. Geosci J 15(2):207–223

    Article  Google Scholar 

  • Qin X, Huang G, Zeng G, Chakma A (2008) Simulation‐based optimization of dual‐phase vacuum extraction to remove nonaqueous phase liquids in subsurface. Water Resour Res 44(4)

  • Rao CR (1952) Advanced statistical methods in biometric research

  • Ring MJ, Lindner D, Cross EF, Schlesinger ME (2012) Causes of the global warming observed since the 19th century. Atmos Climate Sci 2(04):401

    Article  Google Scholar 

  • Rúa A, Bourhim S, Marín E, Hernández E (1999) Characterising SO2 and sulphate patterns in Europe: a cluster analysis. Toxicol Environ Chem 71(1–2):21–32

    Article  Google Scholar 

  • Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486–494

    Article  Google Scholar 

  • Specht DF (1990) Probabilistic neural networks. Neural Netw 3(1):109–118

    Article  Google Scholar 

  • Sun S (1989) Principal component analysis of air pollutant sources in Xiamen, China. China Environ Sci 10:23–41

    Google Scholar 

  • Sun W, Huang GH, Zeng G, Qin X, Yu H (2011) Quantitative effects of composting state variables on C/N ratio through GA-aided multivariate analysis. Sci Total Environ 409(7):1243–1254

    Article  CAS  Google Scholar 

  • Tan Q, Wei Y, Wang M, Liu Y (2014) A cluster multivariate statistical method for environmental quality management. Eng Appl Artif Intell 32:1–9

    Article  Google Scholar 

  • Wang X, Huang G (2015) Impacts assessment of air emissions from point sources in Saskatchewan, Canada—a spatial analysis approach. Environ Prog Sustainable Energy 34(1):304–313

  • Wang X, Huang G, Lin Q, Liu J (2014a) High-resolution probabilistic projections of temperature changes over Ontario, Canada. J Climate 27(14):5259–5284

  • Wang X, Huang G, Lin Q, Nie X, Cheng G, Fan Y, Li Z, Yao Y, Suo M (2013) A stepwise cluster analysis approach for downscaled climate projection—a Canadian case study. Environ Model Softw 49:141–151

    Article  Google Scholar 

  • Wang X, Huang G, Lin Q, Nie X, Liu J (2014b) High‐resolution temperature and precipitation projections over Ontario, Canada: a coupled dynamical‐statistical approach. Q J R Meteorol Soc

  • Wang X, Huang G, Liu J (2014c) Projected increases in intensity and frequency of rainfall extremes through a regional climate modeling approach. J Geophys Res Atmos 119(23):13271–13286

  • Wang X, Huang G, Liu J, (2014d) Projected increases in near-surface air temperature over Ontario, Canada: a regional climate modeling approach. Clim Dyn 1–13

  • Wasserman PD (1993) Advanced methods in neural computing. John Wiley & Sons, Inc

  • Westing AH (2013) Population: perhaps the basic issue, from environmental to comprehensive security. Springer, pp. 133–145

  • Wilks S (1962) Mathematics statistics. John Wiley and Sons, New York

    Google Scholar 

  • Xu Y, Huang GH, Cheng GH, Liu Y, Li YF (2014) A two-stage fuzzy chance-constrained model for solid waste allocation planning. J Environ Inform 24(2):101–110

    Article  Google Scholar 

  • Ye J (2007) Least squares linear discriminant analysis, Proceedings of the 24th international conference on Machine learning. ACM, pp. 1087–1093

  • Zhang N, Li YP, Huang WW, Liu J (2014) An inexact two-stage water quality management model for supporting sustainable development in a rural system. J Environ Inform 24(1):52–64

    Article  Google Scholar 

  • Zou Y, Huang GH, Nie X (2009) Filtered stepwise clustering method for predicting fate of contaminants in groundwater remediation systems: a case study in western Canada. Water Air Soil Pollut 199(1–4):389–405

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This research was supported by the Program for Innovative Research Team in University (IRT1127), the 111 Project (B14008), and the Natural Science and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guohe Huang.

Additional information

Responsible editor: Marcus Schulz

Electronic supplementary material

Below is the link to the electronic supplementary material.

Text S1

Description of the functionality of the rSCA package. (PDF 209 kb)

Text S2

Sample codes and outputs of the application for multivariate modeling. (PDF 184 kb)

Text S3

Sample codes and outputs of the application for multivariate clustering. (PDF 166 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Huang, G., Zhao, S. et al. An open-source software package for multivariate modeling and clustering: applications to air quality management. Environ Sci Pollut Res 22, 14220–14233 (2015). https://doi.org/10.1007/s11356-015-4664-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11356-015-4664-7

Keywords

Navigation