Abstract
The National Capital Region (NCR) encircling the capital of India is the one of the most polluted regions in the world. Poor air quality is a cause of a number of diseases and reduction in life span. Particulate matter (PM) is the most significant as well as the most hazardous air pollutant in this region. This work proposes to build models to analyze and forecast PM concentrations at a location in the NCR. The correlation between PM concentrations in different seasons and with meteorological parameters and other air pollutants is studied to determine the most suitable explanatory variables for building the forecast models. The performance of the proposed models is evaluated with the help of variable importance ranking (VIR), partial plots and measures such as mean error, absolute mean error and root mean square error.
Similar content being viewed by others
References
Akdi Y, Okkaoğlu Y, Gölveren E et al (2020) Estimation and forecasting of PM10 air pollution in Ankara via time series and harmonic regressions. Int J Environ SciTechnol 17:3677–3690. https://doi.org/10.1007/s13762-020-02705-0
Araki Shin, Shima Masayuki, Yamamoto Kouhei (2018) Spatiotemporal land use random forest model for estimating metropolitan NO2 exposure in Japan. Sci Total Environ 634:1269–1277. https://doi.org/10.1016/j.scitotenv.2018.03.324
Ayodeji Abiodun, Liu Yong-kuo (2018) SVR optimization with soft computing algorithms for incipient SGTR diagnosis. Ann Nucl Energy 121:89–100. https://doi.org/10.1016/j.anucene.2018.07.011
Biancofiore F, et al. (2017) Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmos Pollut Res 8(4): 652-659. https://doi.org/10.1016/j.apr.2016.12.014
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Cekim HO (2020) Forecasting PM10 concentrations using time series models: a case of the most polluted cities in Turkey. Environ SciPollut Res 27(20):25612–25624. https://doi.org/10.1007/s11356-020-08164-x
de Lange Anzel, Garland Rebecca M, Dyson Liesl L (2019) Estimating particulate matter (PM) concentrations from a meteorological index for data-scarce regions: a pilot study. AtmosPollut Res 10(5):1553–1564. https://doi.org/10.1016/j.apr.2019.05.004
Díaz-Robles LA et al (2008) A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: the case of Temuco, Chile. Atmos Environ 42(35):8331–8340. https://doi.org/10.1016/j.atmosenv.2008.07.020
Dong X, Yu Z, Cao W et al (2020) A survey on ensemble learning. Front ComputSci 14:241–258. https://doi.org/10.1007/s11704-019-8208-z
Drucker H, et al. (1996) Support vector regression machines. In: NIPS'96: Proceedings of the 9th International Conference on Neural Information Processing Systems, pp 155–161.
Fratello M, Tagliaferri R (2019) Decision trees and random forests. EncyclopBioinformComputBiol ABC Bioinform Elsevier 1:374–383
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–232. https://doi.org/10.1214/aos/1013203451
Gu K, Zhou Y, Sun H et al (2020) Prediction of air quality in Shenzhen based on neural network algorithm. Neural ComputAppl 32:1879–1892. https://doi.org/10.1007/s00521-019-04492-3
Guo J, Yang L, Bie R, Jiguo Y, Gao Y, Shen Y, Kos A (2019) An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring. ComputNetw 151:166–180. https://doi.org/10.1016/j.comnet.2019.01.026
Guo H, Sahu SK, Kota SH, Zhang H (2019) Characterization and health risks of criteria air pollutants in Delhi. Chemosphere 225:27–34. https://doi.org/10.1016/j.chemosphere.2019.02.154
Guttikunda SK, Goel R (2013) Health impacts of particulate pollution in a megacity—Delhi, India. Environ Dev 6:8–20. https://doi.org/10.1016/j.envdev.2012.12.002
Ivanov A, Gocheva-Ilieva S (2013) Short-time particulate matter PM10 forecasts using predictive modeling techniques. AIP ConfProc 1561(1):209. https://doi.org/10.1063/1.4827230
Kim K-H, Kabir E, Kabir S (2015) A review on the human health impact of airborne particulate matter. Environ Int 74:136–143. https://doi.org/10.1016/j.envint.2014.10.005
Koo JW, Wong SW, Selvachandran G et al (2020) Prediction of air pollution index in kualalumpur using fuzzy time series and statistical models. Air QualAtmos Health 13:77–88. https://doi.org/10.1007/s11869-019-00772-y
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5): 1–26. https://doi.org/10.18637/jss.v028.i05.
Leong WC, Kelani RO, Ahmad Z (2020) Prediction of air pollution index (API) using support vector machine (SVM). J Environ ChemEng 8(3):103208. https://doi.org/10.1016/j.jece.2019.103208
Lloyd JR (2014) GEFCom2012 hierarchical load forecasting: gradient boosting machines and Gaussian processes. Int J Forecast 30(2):369–374. https://doi.org/10.1016/j.ijforecast.2013.07.002
Meyer D (2001) Support vector machines: the interface to libsvm pacakge e1071. R-News 1(3): 23–26, ISSN 1609-3631. https://cran.r-project.org/doc/Rnews/
Pai T, Ho C, Chen S et al (2011) Using seven types of GM (1, 1) model to forecast hourly particulate matter concentration in Banciao City of Taiwan. Water Air Soil Pollut 217:25–33. https://doi.org/10.1007/s11270-010-0564-0
Photphanloet C, Lipikorn R (2020) PM10 concentration forecast using modified depth-first search and supervised learning neural network. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2020.138507
Probst P, et al. (2019) Importance of hyperparameters of machine learning algorithms. J Mach Learn Res 20(53), 1−32. http://jmlr.org/papers/v20/18-444.html
Probst P, Wright MN, Boulesteix A-L (2019) Hyperparameters and tuning strategies for random forest. WIREs Data Min KnowlDiscov. https://doi.org/10.1002/widm.1301
Qin S et al (2014) Analysis and forecasting of the particulate matter (PM) concentration levels over four major cities of China using hybrid models. Atmosp Environ 98:665–675. https://doi.org/10.1016/j.atmosenv.2014.09.046
Qunli W, Lin H (2019) Daily urban air quality index forecasting based on variational mode decomposition, sample entropy and LSTM neural network. Sustain Cities Soc 50:101657. https://doi.org/10.1016/j.scs.2019.101657
Rajak R, Chattopadhyay A (2019) Short and long-term exposure to ambient air pollution and impact on health in India: a systematic review. Int J Environ Health Res. https://doi.org/10.1080/09603123.2019.1612042
Russo Ana, Lind Pedro G, Raischel Frank, Trigo Ricardo, Mendes Manuel (2015) Neural network forecast of daily pollution concentration using optimal meteorological data at synoptic and local scales. AtmosPollut Res 6(3):540–549. https://doi.org/10.5094/APR.2015.060
Salazar F, Toledo MA, Oñate E, Morán R (2015) An empirical comparison of machine learning techniques for dam behaviour modeling. StructSaf 56:9–17. https://doi.org/10.1016/J.STRUSAFE.2015.05.001
Srivastava A, Jain VK (2007) Size distribution and source identification of total suspended particulate matter and associated heavy metals in the urban atmosphere of Delhi. Chemosphere 68:579–589. https://doi.org/10.1016/j.chemosphere.2006.12.046
Srivastava et al (2008) Source apportionment of total suspended particulate matter in coarse and fine size ranges over Delhi. Aerosol Air Qual Res 8(2):188–200. https://doi.org/10.4209/aaqr.2007.09.0040
Tecer LH, et al. (2008) Effect of meteorological parameters on fine and coarse particulate matter mass concentration in a coal-mining area in Zonguldak, Turkey. J Air Waste Manag Assoc 58(4): 543–552. https://doi.org/10.3155/1047-3289.58.4.543
Touzani S, Granderson J, Fernandes S (2018) Gradient boosting machine for modeling the energy consumption of commercial buildings. Energy Build 158:1533–43. https://doi.org/10.1016/j.enbuild.2017.11.039
Ventura LMB et al (2019) Forecast of daily PM2.5 concentrations applying artificial neural networks and Holt–Winters models. Air QualAtmos Health 12:317–325. https://doi.org/10.1007/s11869-018-00660-x
World Health Organization (2017) Ambient (outdoor) air pollution. https://bit.ly/2J6O93M. Accessed 27 Sept 2020
Xia Y, Liu C, Li YY, Liu N (2017) A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert SystAppl 78:225–241. https://doi.org/10.1016/j.eswa.2017.02.017
Yang G, Lee HM, Lee G (2020) A hybrid deep learning model to forecast particulate matter concentration levels in Seoul, South Korea. Atmosphere 11(4):348. https://doi.org/10.3390/atmos11040348
Zeinalnezhad M et al (2020) Air pollution prediction using semi-experimental regression model and adaptive neuro-fuzzy inference system. J Clean Prod 261:121218. https://doi.org/10.1016/j.jclepro.2020.121218
Zimmerman N et al (2018) A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring. AtmosMeas Tech 11:291–313. https://doi.org/10.5194/amt-11-291-2018v028.i05
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Barthwal, A., Acharya, D. & Lohani, D. Prediction and analysis of particulate matter (PM2.5 and PM10) concentrations using machine learning techniques. J Ambient Intell Human Comput 14, 1323–1338 (2023). https://doi.org/10.1007/s12652-021-03051-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03051-w