Abstract
Air pollution continues to be an important problem that causes health issues worldwide. Factors such as industrial development, increased vehicle traffic, and energy production have a negative impact on air quality by releasing harmful gases and particles into the atmosphere. Consequently, this can lead to respiratory diseases, cardiovascular problems, and other health complications. Predicting air quality is a crucial step in safeguarding human health and informing environmental policies. Many cities employ measurement instruments and data collection systems to monitor and forecast air quality. This data can be analyzed using machine learning models to predict future air pollution levels. This article examines the performance of a new stacking ensemble model for estimating PM2.5, based on air quality datasets from major cities such as Beijing and Istanbul. The model combines predictions from various machine learning models. In the initial stage of the study, the performance of commonly used models in the literature, such as multi-layer perceptron, support vector regression, and random forest, were evaluated. These models were assessed for their ability to predict PM2.5 using metrics such as mean absolute error (MAE), root mean squared error (RMSE) and R-squared (R2). This evaluation determines the proximity of the model predictions to the actual data. The stacking ensemble model examined in this study yielded the best results for PM2.5 predictions, with MAE of 6.67, RMSE of 8.80 and R2 of 0.91. In conclusion, the stacking ensemble model for air pollution prediction offers a promising approach for achieving superior results compared to traditional machine learning models.
Similar content being viewed by others
References
Air Quality Index Project, TW Beijing air pollution: real-time air quality index (2022). https://aqicn.org/city/beijing/
Akyol K (2020) Stacking ensemble based deep neural networks modeling for effective epileptic seizure detection. Expert Syst Appl 148:113239. https://doi.org/10.1016/j.eswa.2020.113239
Ao Y, Li H, Zhu L, Ali S, Yang Z (2019) The linear random forest algorithm and its advantages in machine learning assisted logging regression modeling. J Petroleum Sci Eng 174:776–789. https://doi.org/10.1016/j.petrol.2018.11.067
Breiman L (2001) Random forests. Mach Learn 45:5–32
Cao Y, Liu G, Sun J, Bavirisetti DP, Xiao G (2023) PSO-Stacking improved ensemble model for campus building energy consumption forecasting based on priority feature selection. J Build Eng 72:106589. https://doi.org/10.1016/j.jobe.2023.106589
Castelli M, Clemente FM, Popovič A, Silva S, Vanneschi L (2020) A machine learning approach to predict air quality in California. Complexity https://doi.org/10.1155/2020/8049504
Chang YS, Abimannan S, Chiao HT, Lin CY, Huang YP (2020) An ensemble learning based hybrid model and framework for air pollution forecasting. Env Sci Poll Res 27:38155–38168. https://doi.org/10.1007/s11356-020-09855-1
Chen B (2020) Air quality index forecasting via deep dictionary learning. IEICE Trans Inf Syst 103(5):1118–1125. https://doi.org/10.1587/transinf.2019EDP7296
Chen MH, Chen YC, Chou TY, Ning FS (2023) PM2.5 concentration prediction model: a CNN–RF ensemble framework. Int J Environ Res Public Health 20(5):4077. https://doi.org/10.3390/ijerph20054077
Chen R, Liang CY, Hong WC, Gu DX (2015) Forecasting holiday daily tourist flow based on seasonal support vector regression with adaptive genetic algorithm. Appl Soft Comput 26:435–443. https://doi.org/10.1016/j.asoc.2014.10.022
Dong X, Yu Z, Cao W, Shi Y, Ma Q (2020) A survey on ensemble learning. Front Comput Sci 14:241–258. https://doi.org/10.1007/s11704-019-8208-z
Fang H, Feng Y, Zhang L, Su M and Yang H (2020) A long short-term memory neural network model for predicting air pollution index based on popular learning. In: Database systems for advanced applications. DASFAA 2020 International Workshops: BDMS, SeCoP, BDQM, GDMA, and AIDE, Jeju, South Korea, September 24–27, 2020, Proceedings 25. Springer International Publishing, pp 190–199
Feng S, Gao D, Liao F, Zhou F, Wang X (2016) The health effects of ambient PM2.5 and potential mechanisms. Ecotoxicol Environ Saf 128:67–74. https://doi.org/10.1016/j.ecoenv.2016.01.030
Gardner MW, Dorling SR (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ 32(14–15):2627–2636. https://doi.org/10.1016/S1352-2310(97)00447-0
Gokul PR, Mathew A, Bhosale A, Nair AT (2023) Spatio-temporal air quality analysis and PM2.5 prediction over Hyderabad City, India using artificial intelligence techniques. Ecol Inf 76:102067. https://doi.org/10.1016/j.ecoinf.2023.102067
Harishkumar KS, Km Y, Gad I (2020) Forecasting air pollution particulate matter (PM2.5) using machine learning regression models. Procedia Comput Sci 171:2057–2066. https://doi.org/10.1016/j.procs.2020.04.221
Janarthanan R, Partheeban P, Somasundaram K, Elamparithi PN (2021) A deep learning approach for prediction of air quality index in a metropolitan city. Sustain Cities Soc 67:102720. https://doi.org/10.1016/j.scs.2021.102720
Janiesch C, Zschech P, Heinrich K (2021) Machine learning and deep learning. Electron Markets 31(3):685–695. https://doi.org/10.1007/s12525-021-00475-2
Juarez EK, Petersen MR (2022) A comparison of machine learning methods to forecast tropospheric ozone levels in Delhi. Atmosphere 13(1):46. https://doi.org/10.3390/atmos13010046
Karakuş CB, Yıldız S (2019) Hava kalite indeksi ile meteorolojik parametreler arasındaki ilişkinin çoklu regresyon yöntemi ile belirlenmesi. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 8(2):698–711. https://doi.org/10.28948/ngumuh.598118
Kumar K, Pande BP (2023) Air pollution prediction with machine learning: a case study of Indian cities. Int J Environ Sci Technol 20(5):5333–5348. https://doi.org/10.1007/s13762-022-04241-5
Kwon H, Park J, Lee Y (2019) Stacking ensemble technique for classifying breast cancer. Healthc Inf Res 25(4):283–288. https://doi.org/10.4258/hir.2019.25.4.283
Li Z, Gan K, Sun S, Wang S (2023) A new PM2.5 concentration forecasting system based on AdaBoost-ensemble system with deep learning approach. J Forecast 42(1):154–175. https://doi.org/10.1002/for.2883
Liang YC, Maimury Y, Chen AHL, Juarez JRC (2020) Machine learning-based prediction of air quality. Appl Sci 10:9151. https://doi.org/10.3390/app10249151
Lin CY, Chang YS, Abimannan S (2021) Ensemble multifeatured deep learning models for air quality forecasting. Atmosph Poll Res 12(5):101045. https://doi.org/10.1016/j.apr.2021.03.008
Liu H, Li Q, Yu D, Gu Y (2019) Air quality index and air pollutant concentration prediction based on machine learning algorithms. Appl Sci 9(19):4069. https://doi.org/10.3390/app9194069
Ma J, Ma X, Yang C, Xie L, Zhang W, Li X (2023) An air pollutant forecast correction model based on ensemble learning algorithm. Electronics 12(6):1463. https://doi.org/10.3390/electronics12061463
Madan T, Sagar S, Virmani D (2020) Air quality prediction using machine learning algorithms–a review. In: 2020 2nd international conference on advances in computing, communication control and networking (ICACCCN). IEEE, pp 140–145
Maltare NN, Vahora S (2023) Air quality index prediction using machine learning for Ahmedabad city. Digit Chem Eng 7:100093. https://doi.org/10.1016/j.dche.2023.100093
Pui DY, Chen SC, Zuo Z (2014) PM2.5 in China: measurements, sources, visibility and health effects, and mitigation. Particuology 13:1–26. https://doi.org/10.1016/j.partic.2013.11.001
Sarkar N, Gupta R, Keserwani PK, Govil MC (2022) Air quality index prediction using an effective hybrid deep learning model. Environ Poll 315:120404. https://doi.org/10.1016/j.envpol.2022.120404
Sethi JK, Mittal M (2019) A new feature selection method based on machine learning technique for air quality dataset. J Stat Manag Syst 22(4):697–705. https://doi.org/10.1080/09720510.2019.1609726
SIM (Sürekli izleme merkezi) | T.C. Çevre, Şehircilik ve İklim Değişikliği Bakanlığı (2023). https://sim.csb.gov.tr/
Wang B, Eum KD, Kazemiparkouhi F, Li C, Manjourides J, Pavlu V, Suh H (2020) The impact of long-term PM2.5 exposure on specific causes of death: exposure-response curves and effect modification among 53 million US Medicare beneficiaries. Environ Health 19:1–12. https://doi.org/10.1186/s12940-020-00575-0
Wang D, Yue X (2019) The weighted multiple meta-models stacking method for regression problem. In: 2019 Chinese control conference (CCC). IEEE, pp 7511–7516
WHO (2022) Household air pollution. 28 Nov 2023
Xiang X, Fahad S, Han MS, Naeem MR, Room S (2023) Air quality index prediction via multi-task machine learning technique: spatial analysis for human capital and intensive air quality monitoring stations. Air Qual Atmos Health 16(1):85–97. https://doi.org/10.1007/s11869-022-01255-3
Yang J, Yan R, Nong M, Liao J, Li F, Sun W (2021) PM2.5 concentrations forecasting in Beijing through deep learning with different inputs, model structures and forecast time. Atmos Poll Res 12(9):101168. https://doi.org/10.1016/j.apr.2021.101168
Yurtsever M, Emeç M (2023) Potable water quality prediction using artificial intelligence and machine learning algorithms for better sustainability. Ege Academic Rev 23(2):265–278. https://doi.org/10.21121/eab.1252167
Zhang Q, Jiang X, Tong D, Davis SJ, Zhao H, Geng G et al (2017) Transboundary health impacts of transported global air pollution and international trade. Nature 543(7647):705–709. https://doi.org/10.1038/nature21712
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest, and all the authors are interested in publishing the manuscript.
Ethical approval
This article contains no studies with human participants or animals performed by authors.
Additional information
Editorial responsibility: Mohamed F. Yassin.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Emeç, M., Yurtsever, M. A novel ensemble machine learning method for accurate air quality prediction. Int. J. Environ. Sci. Technol. (2024). https://doi.org/10.1007/s13762-024-05671-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13762-024-05671-z