Abstract
Atmospheric particulate matter (PM) is one of the pollutants that may have a significant impact on human health. Data collected over 7 years from the air quality monitoring station at the LD-III steelworks, belonging to the Arcelor-Mittal Steel Company, located in the metropolitan area of Avilés (Principality of Asturias, Northern Spain), is analyzed using four different mathematical models: vector autoregressive moving-average, autoregressive integrated moving-average (ARIMA), multilayer perceptron neural networks and support vector machines with regression. Measured monthly, the average concentration of pollutants (SO2, NO and NO2) and PM10 (particles with a diameter less than 10 μm) is used as input to forecast the monthly average concentration of PM10 from one to 7 months ahead. Simulations showed that the ARIMA model performs better than the other models when forecasting 1 month ahead, while in the forecast from one to 9 months ahead the best performance is given by the support vector regression.
This is a preview of subscription content, access via your institution.







References
Aurangojeb M (2011) Relationship between PM10, NO2 and particle number concentration: validity of air quality controls. Proc Environ Sci 6:60–69
Berk RA (2008) Statistical learning from a regression perspective. Springer, New York
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Caicoya M, Mirón JA (2003) Lung cancer and smoking in Asturias, Spain. A case-control study. Gac Sanit 17(3):226–230
Casteleiro-Roca JL, Quintián H, Calvo-Rolle JL, Corchado E, Meizoso-López MC, Piñón-Pazos A (2016) An intelligent fault detection system for a heat pump installation based on a geothermal heat exchanger. J Appl Log 17:36–47
Casteleiro-Roca JL, Calvo-Rolle JL, Méndez Pérez JA, Roqueñí Gutiérrez N, de Cos Juez FJ (2017) Hybrid intelligent system to perform fault detection on BIS sensor during surgeries. Sensors 17(1):179–195
Crespo Turrado C, Meizoso López MC, Sánchez Lasheras F, Rodríguez Gómez BA, Calvo Rollé JL, de Cos Juez FJ (2014) Missing data imputation of solar radiation data under different atmospheric conditions. Sensors 14:20382–20399
de Andrés J, Sánchez-Lasheras F, Lorca P, de Cos Juez FJ (2011) A hybrid device of self organizing maps (SOM) and multivariate adaptive regression splines (MARS) for the forecasting of firms’ bankruptcy. Account Manag Info Syst 10(3):351–374
de Cos Juez FJ, García Nieto PJ, Martínez Torres J, Taboada Castro J (2010) Analysis of lead times of metallic components in the aerospace industry through a supported vector machine model. Math Comput Model 52:1177–1184
Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe, Council of the European Union, Brussels
Dockery DW, Pope CA (1994) Acute respiratory effects of particulate air pollution. Annu Rev Publ Health 15:107–132
Dotse SQ, Petra MI, Dagar L, De Silva LC (2018) Application of computational intelligence techniques to forecast daily PM10 exceedances in Brunei Darussalam. Atmos Pollut Res 9(2):358–368
Fernández-Navarro P, García-Pérez J, Ramis R, Boldo E, López-Abente G (2017) Industrial pollution and cancer in Spain: an important public health issue. Environ Res 159:555–563
Freedman D, Pisani R, Purves R (2007) Statistics. W.W. Norton & Company, New York
García Nieto PJ, Alonso Fernández JR, Sánchez Lasheras F, de Cos Juez FJ, Díaz Muñiz D (2012) A new improved study of cyanotoxins presence from experimental cyanobacteria concentrations in the Trasona reservoir (Northern Spain) using the MARS technique. Sci Total Environ 430:88–92
García Nieto PJ, García-Gonzalo E, Bové J, Arbat G, Duran-Ros M, Puig-Bargués J (2017) Modeling pressure drop produced by different filtering media in microirrigation sand filters using the hybrid ABC-MARS-based approach, MLP neural network and M5 model tree. Comput Electron Agric 139:65–74
García Nieto PJ, García-Gonzalo E, Álvarez Antón JC, González Suárez VM, Mayo Bayón R, Mateos Martín F (2018) A comparison of several machine learning techniques for the centerline segregation prediction in continuous cast steel slabs and evaluation of its performance. J Comput Appl Math 330:877–895
Gocheva-Ilieva SG, Ivanov AV, Voynikova DS, Todorov Boyadzhiev D (2014) Time series analysis and forecasting for air pollution in small urban area: an SARIMA and factor analysis approach. Stoch Environ Res Risk Assess 28(4):1045–1060
Godish T, Davis WT, Fu JS (2014) Air quality. CRC Press, Boca Ratón
Gruszecka-Kosowska A (2018) Assessment of the Kraków inhabitants’ health risk caused by the exposure to inhalation of outdoor air contaminants. Stoch Environ Res Risk Assess 32(2):485–499
Hamel LH (2009) Knowledge discovery with support vector machines. Wiley, New York
Hastie T, Tibshirani R, Friedman J (2003) The elements of statistical learning. Springer, New York
Hooyberghs J, Mensink C, Dumont D, Fierens F, Brasseur O (2005) A neural network forecast for daily average PM10 concentrations in Belgium. Atmos Environ 39(18):3279–3289
Kar S, Mukherjee P (2012) Studies on interrelations among SO2, NO2 and PM10 concentrations and their predictions in ambient air in Kolkata. Open J Air Pollut 1:42–50
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York
Kukkonen J, Partanen L, Karpinen A, Ruuskanen J, Junninen H, Kolehmainen M, Niska H, Dorling S, Chatterton T, Foxall R, Cawley G (2003) Extensive evaluation of neural networks models for the prediction of NO2 and PM10 concentrations, compared with a deterministic modelling system and measurements in central Helsinki. Atmos Environ 37:4539–4550
Lary DJ, Faruque FS, Malakar N, Moore A, Roscoe B, Adams ZL, Eggelston Y (2014) Estimating the global abundance of ground level presence of particulate matter (PM2.5). Geospat Health 8(3):S611–S630
Lary DJ, Lary T, Satler B (2015) Using machine learning to estimate global PM2.5 for environmental health studies. Environ Health Insights 9(S1):41–52
Lorga G, Raicu CB, Stefan S (2015) Annual air pollution level of major primary pollutants in Greater Area of Bucharest. Atmos Pollut Res 6(5):824–834
Luo H, Wang D, Yue C, Liu Y, Guo H (2018) Research and application of a novel hybrid decomposition-ensemble learning paradigm with error correction for daily PM10 forecasting. Atmos Res 201(1):34–45
Muñoz E, Martín ML, Turias IJ, Jimenez-Come MJ, Trujillo FJ (2014) Prediction of PM10 and SO2 exceedances to control air pollution in the Bay of Algeciras, Spain. Stoch Environ Res Risk Assess 28(6):1409–1420
Navares R, Díaz J, Linares C, Aznarte JL (2018) Comparing ARIMA and computational intelligence methods to forecast daily hospital admissions due to circulatory and respiratory causes in Madrid. Stoch Environ Res Risk Assess. https://doi.org/10.1007/s00477-018-1519-z
Ordieres JB, Vergara EP, Capuz RS, Salazar RE (2005) Neural network prediction model for fine particulate matter (PM2.5) on the US-Mexico border in El Paso (Texas) and Ciudad Juarez (Chihuahua). Environ Model Softw 20:547–559
Ortiz C, Linares C, Carmona R, Díaz J (2017) Evaluation of short-term mortality attributable to particulate matter pollution in Spain. Environ Pollut 224:541–551
Ostro BD, Eskeland GS, Sánchez JM, Feyzioglu T (1999) Air pollution and health effects: a study of medical visits among children in Santiago, Chile. Environ Health Perspect 107:69–73
Pérez P, Reyes J (2002) Prediction of maximum of 24-h average of PM10 concentrations 30 h in advance in Santiago, Chile. Atmos Environ 36:4555–4561
Pérez P, Trier A, Reyes J (2000) Prediction of PM2.5 concentrations several hours in advance using neural networks in Santiago, Chile. Atmos Environ 34:1189–1196
Shumway RH, Stoffer DS (2017) Time series analysis and its applications with R examples. Springer, Berlin
Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
Taneja K, Ahmad S, Ahmad K, Attri SD (2016) Time series analysis of aerosol optical depth over New Delhi using Box-Jenkins ARIMA modeling approach. Atmos Pollut Res 7:585–596
Tsay RS (2014) Multivariate time series analysis with R and financial applications. Wiley, Chicago
Turner MC, Krewski D, Pope CA, Chen Y, Gapstur SM, Thun MJ (2011) Long-term ambient fine particulate matter air pollution and lung cancer in a large cohort of never-smokers. Am J Respir Crit Care Med 184:1374–1381
Vong CM, Ip WF, Wong PK, Chiu CC (2014) Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing 128(27):136–144
Wang P, Zhang H, Qin Z, Zhang G (2017) A novel hybrid-Garch model based on ARIMA and SVM for PM2.5 concentrations forecasting. Atmos Pollut Res 8:850–860
Wasserman L (2003) All of statistics: a concise course in statistical inference. Springer, New York
Wiwanitkit V (2008) PM10 in the atmosphere and incidence of respiratory illness in Chiangmai during the smoggy pollution. Stoch Environ Res Risk Assess 22(3):437–440
Zhang ZH, Hu MG, Ren J, Zhang ZY, Christakos G, Wang JF (2017) Probabilistic assessment of high concentrations of particulate matter (PM10) in Beijing, China. Atmos Pollut Res 8(6):1143–1150
Acknowledgements
The authors wish to thank the General Directorate of Prevention and Environmental Control (from Ministry of Infrastructure, Spatial Planning and Environment of the Principality of Asturias) for providing the experimental dataset used in this research. Additionally, we would like to thank Anthony Ashworth for his revision of English grammar and spelling of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
García Nieto, P.J., Sánchez Lasheras, F., García-Gonzalo, E. et al. Estimation of PM10 concentration from air quality data in the vicinity of a major steelworks site in the metropolitan area of Avilés (Northern Spain) using machine learning techniques. Stoch Environ Res Risk Assess 32, 3287–3298 (2018). https://doi.org/10.1007/s00477-018-1565-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-018-1565-6
Keywords
- Support vector regression (SVR)
- Multilayer perceptron (MLP)
- Vector autoregressive moving-average (VARMA)
- Autoregressive integrated moving-average (ARIMA)
- Monthly PM10 concentration
- Pollution episode