Skip to main content

Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques

Abstract

Atmospheric particulate matter (PM) is one of the pollutants that may have a significant impact on human health. Data collected over 7 years from the air quality monitoring station at the LD-III steelworks, belonging to the Arcelor-Mittal Steel Company, located in the metropolitan area of Avilés (Principality of Asturias, Northern Spain), is analyzed using four different mathematical models: vector autoregressive moving-average, autoregressive integrated moving-average (ARIMA), multilayer perceptron neural networks and support vector machines with regression. Measured monthly, the average concentration of pollutants (SO2, NO and NO2) and PM10 (particles with a diameter less than  10 μm) is used as input to forecast the monthly average concentration of PM10 from one to 7 months ahead. Simulations showed that the ARIMA model performs better than the other models when forecasting 1 month ahead, while in the forecast from one to 9 months ahead the best performance is given by the support vector regression.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  • Aurangojeb M (2011) Relationship between PM10, NO2 and particle number concentration: validity of air quality controls. Proc Environ Sci 6:60–69

    Article  CAS  Google Scholar 

  • Berk RA (2008) Statistical learning from a regression perspective. Springer, New York

    Google Scholar 

  • Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York

    Google Scholar 

  • Caicoya M, Mirón JA (2003) Lung cancer and smoking in Asturias, Spain. A case-control study. Gac Sanit 17(3):226–230

    Article  CAS  Google Scholar 

  • Casteleiro-Roca JL, Quintián H, Calvo-Rolle JL, Corchado E, Meizoso-López MC, Piñón-Pazos A (2016) An intelligent fault detection system for a heat pump installation based on a geothermal heat exchanger. J Appl Log 17:36–47

    Article  Google Scholar 

  • Casteleiro-Roca JL, Calvo-Rolle JL, Méndez Pérez JA, Roqueñí Gutiérrez N, de Cos Juez FJ (2017) Hybrid intelligent system to perform fault detection on BIS sensor during surgeries. Sensors 17(1):179–195

    Article  Google Scholar 

  • Crespo Turrado C, Meizoso López MC, Sánchez Lasheras F, Rodríguez Gómez BA, Calvo Rollé JL, de Cos Juez FJ (2014) Missing data imputation of solar radiation data under different atmospheric conditions. Sensors 14:20382–20399

    Article  Google Scholar 

  • de Andrés J, Sánchez-Lasheras F, Lorca P, de Cos Juez FJ (2011) A hybrid device of self organizing maps (SOM) and multivariate adaptive regression splines (MARS) for the forecasting of firms’ bankruptcy. Account Manag Info Syst 10(3):351–374

    Google Scholar 

  • de Cos Juez FJ, García Nieto PJ, Martínez Torres J, Taboada Castro J (2010) Analysis of lead times of metallic components in the aerospace industry through a supported vector machine model. Math Comput Model 52:1177–1184

    Article  Google Scholar 

  • Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe, Council of the European Union, Brussels

  • Dockery DW, Pope CA (1994) Acute respiratory effects of particulate air pollution. Annu Rev Publ Health 15:107–132

    Article  CAS  Google Scholar 

  • Dotse SQ, Petra MI, Dagar L, De Silva LC (2018) Application of computational intelligence techniques to forecast daily PM10 exceedances in Brunei Darussalam. Atmos Pollut Res 9(2):358–368

    Article  Google Scholar 

  • Fernández-Navarro P, García-Pérez J, Ramis R, Boldo E, López-Abente G (2017) Industrial pollution and cancer in Spain: an important public health issue. Environ Res 159:555–563

    Article  Google Scholar 

  • Freedman D, Pisani R, Purves R (2007) Statistics. W.W. Norton & Company, New York

    Google Scholar 

  • García Nieto PJ, Alonso Fernández JR, Sánchez Lasheras F, de Cos Juez FJ, Díaz Muñiz D (2012) A new improved study of cyanotoxins presence from experimental cyanobacteria concentrations in the Trasona reservoir (Northern Spain) using the MARS technique. Sci Total Environ 430:88–92

    Article  Google Scholar 

  • García Nieto PJ, García-Gonzalo E, Bové J, Arbat G, Duran-Ros M, Puig-Bargués J (2017) Modeling pressure drop produced by different filtering media in microirrigation sand filters using the hybrid ABC-MARS-based approach, MLP neural network and M5 model tree. Comput Electron Agric 139:65–74

    Article  Google Scholar 

  • García Nieto PJ, García-Gonzalo E, Álvarez Antón JC, González Suárez VM, Mayo Bayón R, Mateos Martín F (2018) A comparison of several machine learning techniques for the centerline segregation prediction in continuous cast steel slabs and evaluation of its performance. J Comput Appl Math 330:877–895

    Article  Google Scholar 

  • Gocheva-Ilieva SG, Ivanov AV, Voynikova DS, Todorov Boyadzhiev D (2014) Time series analysis and forecasting for air pollution in small urban area: an SARIMA and factor analysis approach. Stoch Environ Res Risk Assess 28(4):1045–1060

    Article  Google Scholar 

  • Godish T, Davis WT, Fu JS (2014) Air quality. CRC Press, Boca Ratón

    Google Scholar 

  • Gruszecka-Kosowska A (2018) Assessment of the Kraków inhabitants’ health risk caused by the exposure to inhalation of outdoor air contaminants. Stoch Environ Res Risk Assess 32(2):485–499

    Article  Google Scholar 

  • Hamel LH (2009) Knowledge discovery with support vector machines. Wiley, New York

    Book  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2003) The elements of statistical learning. Springer, New York

    Google Scholar 

  • Hooyberghs J, Mensink C, Dumont D, Fierens F, Brasseur O (2005) A neural network forecast for daily average PM10 concentrations in Belgium. Atmos Environ 39(18):3279–3289

    Article  CAS  Google Scholar 

  • Kar S, Mukherjee P (2012) Studies on interrelations among SO2, NO2 and PM10 concentrations and their predictions in ambient air in Kolkata. Open J Air Pollut 1:42–50

    Article  Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    Book  Google Scholar 

  • Kukkonen J, Partanen L, Karpinen A, Ruuskanen J, Junninen H, Kolehmainen M, Niska H, Dorling S, Chatterton T, Foxall R, Cawley G (2003) Extensive evaluation of neural networks models for the prediction of NO2 and PM10 concentrations, compared with a deterministic modelling system and measurements in central Helsinki. Atmos Environ 37:4539–4550

    Article  CAS  Google Scholar 

  • Lary DJ, Faruque FS, Malakar N, Moore A, Roscoe B, Adams ZL, Eggelston Y (2014) Estimating the global abundance of ground level presence of particulate matter (PM2.5). Geospat Health 8(3):S611–S630

    Article  Google Scholar 

  • Lary DJ, Lary T, Satler B (2015) Using machine learning to estimate global PM2.5 for environmental health studies. Environ Health Insights 9(S1):41–52

    CAS  Google Scholar 

  • Lorga G, Raicu CB, Stefan S (2015) Annual air pollution level of major primary pollutants in Greater Area of Bucharest. Atmos Pollut Res 6(5):824–834

    Article  Google Scholar 

  • Luo H, Wang D, Yue C, Liu Y, Guo H (2018) Research and application of a novel hybrid decomposition-ensemble learning paradigm with error correction for daily PM10 forecasting. Atmos Res 201(1):34–45

    Article  CAS  Google Scholar 

  • Muñoz E, Martín ML, Turias IJ, Jimenez-Come MJ, Trujillo FJ (2014) Prediction of PM10 and SO2 exceedances to control air pollution in the Bay of Algeciras, Spain. Stoch Environ Res Risk Assess 28(6):1409–1420

    Article  Google Scholar 

  • Navares R, Díaz J, Linares C, Aznarte JL (2018) Comparing ARIMA and computational intelligence methods to forecast daily hospital admissions due to circulatory and respiratory causes in Madrid. Stoch Environ Res Risk Assess. https://doi.org/10.1007/s00477-018-1519-z

    Article  Google Scholar 

  • Ordieres JB, Vergara EP, Capuz RS, Salazar RE (2005) Neural network prediction model for fine particulate matter (PM2.5) on the US-Mexico border in El Paso (Texas) and Ciudad Juarez (Chihuahua). Environ Model Softw 20:547–559

    Article  Google Scholar 

  • Ortiz C, Linares C, Carmona R, Díaz J (2017) Evaluation of short-term mortality attributable to particulate matter pollution in Spain. Environ Pollut 224:541–551

    Article  CAS  Google Scholar 

  • Ostro BD, Eskeland GS, Sánchez JM, Feyzioglu T (1999) Air pollution and health effects: a study of medical visits among children in Santiago, Chile. Environ Health Perspect 107:69–73

    Article  CAS  Google Scholar 

  • Pérez P, Reyes J (2002) Prediction of maximum of 24-h average of PM10 concentrations 30 h in advance in Santiago, Chile. Atmos Environ 36:4555–4561

    Article  Google Scholar 

  • Pérez P, Trier A, Reyes J (2000) Prediction of PM2.5 concentrations several hours in advance using neural networks in Santiago, Chile. Atmos Environ 34:1189–1196

    Article  Google Scholar 

  • Shumway RH, Stoffer DS (2017) Time series analysis and its applications with R examples. Springer, Berlin

    Book  Google Scholar 

  • Steinwart I, Christmann A (2008) Support vector machines. Springer, New York

    Google Scholar 

  • Taneja K, Ahmad S, Ahmad K, Attri SD (2016) Time series analysis of aerosol optical depth over New Delhi using Box-Jenkins ARIMA modeling approach. Atmos Pollut Res 7:585–596

    Article  Google Scholar 

  • Tsay RS (2014) Multivariate time series analysis with R and financial applications. Wiley, Chicago

    Google Scholar 

  • Turner MC, Krewski D, Pope CA, Chen Y, Gapstur SM, Thun MJ (2011) Long-term ambient fine particulate matter air pollution and lung cancer in a large cohort of never-smokers. Am J Respir Crit Care Med 184:1374–1381

    Article  Google Scholar 

  • Vong CM, Ip WF, Wong PK, Chiu CC (2014) Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing 128(27):136–144

    Article  Google Scholar 

  • Wang P, Zhang H, Qin Z, Zhang G (2017) A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmos Pollut Res 8:850–860

    Article  Google Scholar 

  • Wasserman L (2003) All of statistics: a concise course in statistical inference. Springer, New York

    Google Scholar 

  • Wiwanitkit V (2008) PM10 in the atmosphere and incidence of respiratory illness in Chiangmai during the smoggy pollution. Stoch Environ Res Risk Assess 22(3):437–440

    Article  Google Scholar 

  • Zhang ZH, Hu MG, Ren J, Zhang ZY, Christakos G, Wang JF (2017) Probabilistic assessment of high concentrations of particulate matter (PM10) in Beijing, China. Atmos Pollut Res 8(6):1143–1150

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the General Directorate of Prevention and Environmental Control (from Ministry of Infrastructure, Spatial Planning and Environment of the Principality of Asturias) for providing the experimental dataset used in this research. Additionally, we would like to thank Anthony Ashworth for his revision of English grammar and spelling of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. J. García Nieto.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

García Nieto, P.J., Sánchez Lasheras, F., García-Gonzalo, E. et al. Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques. Stoch Environ Res Risk Assess 32, 3287–3298 (2018). https://doi.org/10.1007/s00477-018-1565-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-018-1565-6

Keywords

  • Support vector regression (SVR)
  • Multilayer perceptron (MLP)
  • Vector autoregressive moving-average (VARMA)
  • Autoregressive integrated moving-average (ARIMA)
  • Monthly PM10 concentration
  • Pollution episode