Abstract
Fine particulate matter (PM2.5) is a hazardous air pollutant with an aerodynamic diameter of 2.5 μm or less, which can lead to severe health impacts such as cardiovascular disease, respiratory illnesses, and various types of cancer. Therefore, accurate forecasting of PM2.5 concentrations is crucial for public health and policy-making. However, due to the stochastic nature of PM2.5, achieving high prediction accuracy and efficiency remains a challenge. To address this challenge, this study proposes a hybrid deep learning model consisting of principal component analysis (PCA), discrete stationary wavelet transform (DSWT), and Nested LSTM (NLSTM) neural network to predict PM2.5 concentrations. The proposed model aims to leverage the strengths of each technique to achieve better accuracy and efficiency in PM2.5 forecasting. Specifically, PCA is employed as the feature extraction method to reduce the dimensionality of the data and improve computing efficiency. Additionally, DSWT is utilized to decompose the reduced-dimensional data into several sub-signals that are more regular and stable, enabling the NLSTM network to learn each sub-signal separately. Finally, the predicted values of each sub-signal are reconstructed to obtain the final PM2.5 forecast. The proposed model is validated using daily air pollutants and meteorological variables collected in Taiyuan, China, from January 1, 2016, to December 31, 2020. The long-term, medium-term, and short-term forecast results demonstrate that the proposed model achieves better accuracy and efficiency compared to existing models. Overall, the proposed hybrid deep learning model provides a promising solution for accurate and efficient forecasting of PM2.5 concentrations, and the findings of this study have important implications for public health and environmental policy.
Similar content being viewed by others
Data availability
Air pollutants data can be obtained from the website (https://www.aqistudy.cn/historydata/) and meteorological data is obtained from National Meteorological Science Data Center (https://data.cma.cn/)
References
Anshuka A, Chandra R, Buzacott AJV et al (2022) Spatio temporal hydrological extreme forecasting framework using LSTM deep learning model. Stoch Environ Res Risk Assess 36(10):3467–3485. https://doi.org/10.1007/s00477-022-02204-3
Biancofiore F, Busilacchio M, Verdecchia M et al (2017) Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmos Pollut Res 8(4):652–659. https://doi.org/10.1016/j.apr.2016.12.014
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Breiman L (2017) Classification and regression trees. Routledge, England
Calderón-Garcidueñas L, Solt AC, Henríquez-Roldán C et al (2008) Long-term air pollution exposure is associated with neuroinflammation, an altered innate immune response, disruption of the blood-brain barrier, ultrafine particulate deposition, and accumulation of amyloid β-42 and α-synuclein in children and young adults. Toxicol Pathol 36(2):289–310. https://doi.org/10.1177/0192623307313011
Cetin M (2015) Using GIS analysis to assess urban green space in terms of accessibility: case study in Kutahya. Int J Sust Dev World 22(5):420–424. https://doi.org/10.1080/13504509.2015.1061066
Cetin M (2019) The effect of urban planning on urban formations determining bioclimatic comfort area’s effect using satellitia imagines on air quality: a case study of Bursa city. Air Qual Atmos Health 12(10):1237–1249. https://doi.org/10.1007/s11869-019-00742-4
Cetin M (2020) Climate comfort depending on different altitudes and land use in the urban areas in Kahramanmaras City. Air Qual Atmos Health 13(8):991–999. https://doi.org/10.1007/s11869-020-00858-y
Cetin M, Adiguzel F, Gungor S, Kaya E, Sancar MC (2019) Evaluation of thermal climatic region areas in terms of building density in urban management and planning for Burdur, Turkey. Air Qual Atmos Health 12:1103–1112. https://doi.org/10.1007/s11869-019-00727-3
Chen J, Lu J, Avise JC et al (2014) Seasonal Modeling of PM2.5 in California's San Joaquin Valley. Atmos Environ 92:182–190. https://doi.org/10.1016/j.atmosenv.2014.04.030
Chen YC, Li DC (2021) Selection of key features for PM2. 5 prediction using a wavelet model and RBF-LSTM. Appl Intell 51(4):2534–2555. https://doi.org/10.1007/s10489-020-02031-5
Cheng Y, Zhang H, Liu Z et al (2019) Hybrid algorithm for short-term forecasting of PM 2.5 in China. Atmos Environ 200:264–279. https://doi.org/10.1016/j.atmosenv.2018.12.025
Chung J, Gulcehre C, Cho KH et al (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. NIPS. https://doi.org/10.48550/arXiv.1412.3555
Cobourn WG (2010) An enhanced PM2. 5 air quality forecast model based on nonlinear regression and back-trajectory concentrations. Atmos Environ 44(25):3015–3023. https://doi.org/10.1016/j.atmosenv.2010.05.009
Crone SF, Kourentzes N (2010) Feature selection for time series prediction – a combined filter and wrapper approach for neural networks. Neurocomputing 73(10–12):1923–1936. https://doi.org/10.1016/j.neucom.2010.01.017
Crouse DL, Goldberg MS, Ross NA (2009) A prediction-based approach to modelling temporal and spatial variability of traffic-related air pollution in Montreal. Canada. Atmos Environ 43(32):5075–5084. https://doi.org/10.1016/j.atmosenv.2009.06.040
Dhakal S, Gautam Y, Bhattarai A (2021) Exploring a deep LSTM neural network to forecast daily PM 2.5 concentration using meteorological parameters in Kathmandu Valley. Nepal. Air Qual Atmos Health 14:83–96. https://doi.org/10.1007/s11869-020-00915-6
Drucker H, Burges CJ, Kaufman L, Smola A, Vapnik V (1996) Support vector regression machines. NIPS 96:155–161
Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211. https://doi.org/10.1016/0364-0213(90)90002-E
Fang C, Zhang Z, Jin M et al (2017) Pollution Characteristics of PM2.5. Aerosol during Haze Periods in Changchun. China. Aerosol Air Qual Res 17:888–895. https://doi.org/10.4209/aaqr.2016.09.0407
Fang S, Li Q, Karimian H et al (2022) DESA: a novel hybrid decomposing-ensemble and spatiotemporal attention model for PM2.5 forecasting. Environ Sci Pollut Res 29:54150–54166. https://doi.org/10.1007/s11356-022-19574-4
Freeman BS, Taylor G, Gharabaghi B et al (2018) Forecasting air quality time series using deep learning. J Air Waste Manag Assoc 68(8):866–886. https://doi.org/10.1080/10962247.2018.1459956
Gardner MW, Dorling SR (1999) Neural network modelling and prediction of hourly NOx and NO2 concentrations in urban air in London. Atmos Environ 33(5):709–719. https://doi.org/10.1016/S1352-2310(98)00230-1
Gers FA, Schmid H (2000) Learning to Forget: Continual Prediction with LSTM. Neural Comput 12(10):2451–2471. https://doi.org/10.1162/089976600300015015
Han JY, Wang JH, Zhao Y, Wang QM, Zhang B, Li HH, Zhai JQ (2018) Spatio-temporal variation of potential evapotranspiration and climatic drivers in the Jing-Jin-Ji region, North China. Agric For Meteorol 256:75–83. https://doi.org/10.1016/j.agrformet.2018.03.002
He J, Gong S, Yu Y et al (2017) Air pollution characteristics and their relation to meteorological conditions during 2014-2015 in major Chinese cities. Environ Pollution 223:484–496. https://doi.org/10.1016/j.envpol.2017.01.050
Hochreiter S, Schmidhuber J (1997) Long Short-Term Memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Jin N, Zeng Y, Yan K et al (2021) Multivariate air quality forecasting with nested long short term memory neural network. IEEE Trans Industr Inform 17(12):8514–8522. https://doi.org/10.1109/TII.2021.3065425
Kabacoff RI (2015) R in action: data analysis and graphics with R. Simon and Schuster, New York
Kaur A, Sood SK (2020) Deep learning based drought assessment and prediction framework. Ecol Inform 57(101067):1–9. https://doi.org/10.1016/j.ecoinf.2020.101067
Kilicoglu C, Cetin M, Aricak B, Sevik H (2021) Integrating multicriteria decision-making analysis for a GIS-based settlement area in the district of Atakum, Samsun. Turkey. Theor Appl Climatol 143(1-2):379–388. https://doi.org/10.1007/s00704-020-03439-2
King AP, Eckersley R (2019) Statistics for biomedical engineers and scientists: How to visualize and analyze data. Academic Press, Cambridge
Kulmala M (2015) Atmospheric chemistry: China’s choking cocktail. Nature 526(7574):497–499. https://doi.org/10.1038/526497a
Kumar D (2018) Evolving Differential evolution method with random forest for prediction of Air Pollution. Procedia Comput Sci 132:824–833. https://doi.org/10.1016/j.procs.2018.05.094
Liu H, Yin S, Chen C et al (2020) Data multi-scale decomposition strategies for air pollution forecasting: A comprehensive review. J Clean Prod 277(124023):1–18. https://doi.org/10.1016/j.jclepro.2020.124023
Lu W (2020) Deep learning notes. Peking University Press, Beijing
Lv B, Cobourn WG, Bai Y (2016) Development of nonlinear empirical models to forecast daily PM2. 5 and ozone levels in three large Chinese cities. Atmos Environ 147:209–223. https://doi.org/10.1016/j.atmosenv.2016.10.003
Moniz JRA, Krueger D (2017) Nested lstms. Asian Conf Machine Learn PMLR 2017:530–544. https://doi.org/10.48550/arXiv.1801.10308
Monner D, Reggia JA (2012) A generalized LSTM-like training algorithm for second-order recurrent neural networks. Neural Netw 25:70–83. https://doi.org/10.1016/j.neunet.2011.07.003
Navares R, Aznarte JL (2019) Predicting air quality with deep learning LSTM: towards comprehensive models. Eco Inform 55(101019):1–18. https://doi.org/10.1016/j.ecoinf.2019.101019
Papanastasiou DK, Melas D, Kioutsioukis I (2007) Development and assessment of neural network and multiple regression models in order to predict PM10 levels in a medium-sized Mediterranean city. Water Air Soil Pollut 182(1):325–334. https://doi.org/10.1007/s11270-007-9341-0
Paschalidou AK, Karakitsios S, Kleanthous S et al (2011) Forecasting hourly PM10 concentration in Cyprus through artificial neural networks and multiple regression models: implications to local environmental management. Environ Sci Pollut Res 18(2):316–327. https://doi.org/10.1007/s11356-010-0375-2
Percival DB, Walden AT (2000) Wavelet Methods for Time Series Analysis. Cambridge university press, London
Ping W, Yong L, Qin Z et al (2015) A novel hybrid forecasting model for PM10 and SO2 daily concentrations. Sci Total Environ 505:1202–1212. https://doi.org/10.1016/j.scitotenv.2014.10.078
Qi Y, Li Q, Karimian H, Liu D (2019) A hybrid model for spatiotemporal forecasting of PM2. 5 based on graph convolutional neural network and long short-term memory. Sci Total Environ 664:1–10. https://doi.org/10.1016/j.scitotenv.2019.01.333
Russo A, Lind PG, Raischel F, Trigo R, Mendes M (2015) Neural network forecast of daily pollution concentration using optimal meteorological data at synoptic and local scales. Atmos Pollut Res 6:540–549. https://doi.org/10.5094/APR.2015.060
Seng D, Zhang Q, Zhang X et al (2020) Spatiotemporal prediction of air quality based on LSTM neural network. Alex Eng J 60(2):2021–2032. https://doi.org/10.1016/j.aej.2020.12.009
Sheikhan M, Mohammadi N (2013) Time series prediction using PSO-optimized neural network and hybrid feature selection algorithm for IEEE load data. Neural Comput & Applic 23:1185–1194. https://doi.org/10.1007/s00521-012-0980-8
Slini T, Kaprara A, Karatzas K, Moussiopoulos N (2006) PM10 forecasting for Thessaloniki, Greece. Environ Model Softw 21:559–565. https://doi.org/10.1016/j.envsoft.2004.06.011
State Bureau of Environment Protection (2012) Ambient Air Quality Standard (GB3095-2012). http://www.cnemc.cn/jcgf/dqhj/201711/t20171108_647276.shtml. Accessed 12 Nov 2022
Sun W, Zhang H, Palazoglu A, Singh A, Zhang W, Liu S (2012) Prediction of 24-hour-average PM2.5 concentrations using a hidden Markov model with different emission distributions in northern California. Sci Total Environ 443(15):93–103. https://doi.org/10.1016/j.scitotenv.2012.10.070
Tao Q, Liu F, Li Y et al (2019) Air Pollution Forecasting Using a Deep Learning Model Based on 1D Convnets and Bidirectional GRU. IEEE Access 7:76690–76698. https://doi.org/10.1109/ACCESS.2019.2921578
Tella A, Balogun AL (2021) GIS-based air quality modelling: Spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms. Environ Sci Pollut Res 29:86109–86125. https://doi.org/10.1007/s11356-021-16150-0
Tie X, Madronich S, Li GH et al (2007) Characterizations of chemical oxidants in Mexico City: A regional chemical dynamical model (WRF-Chem) study. Atmos Environ 41(9):1989–2008. https://doi.org/10.1016/j.atmosenv.2006.10.053
Wang J, Xu W, Dong J et al (2022) Two-stage deep learning hybrid framework based on multi-factor multi-scale and intelligent optimization for air pollutant prediction and early warning. Stoch Environ Res Risk 2022:1–21. https://doi.org/10.1007/s00477-022-02202-5
Wang ZF, Li J, Wang Z et al (2014) Modeling study of regional severe hazes over mid-eastern China in January 2013 and its implications on pollution prevention and control. Sci China Earth Sci 57(1):3–13. https://doi.org/10.1007/s11430-013-4793-0
Wen H, Dang Y, Li L (2020) Short-Term PM2.5 Concentration Prediction by Combining GNSS and Meteorological Factors. IEEE Access 8:115202–115216. https://doi.org/10.1109/ACCESS.2020.3003580
WHO Health Organization (2021) Ambient (Outdoor) Air Pollution. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health. Accessed 12 Nov 2022
Wu Q, Lin H (2019) A novel optimal-hybrid model for daily air quality index prediction considering air pollutant factors. Sci Total Environ 683:808–821. https://doi.org/10.1016/j.scitotenv.2019.05.288
Wu Z, Zhao W, Lv Y (2022) An ensemble LSTM-based AQI forecasting model with decomposition-reconstruction technique via CEEMDAN and fuzzy entropy. Air Qual Atmos Health 15(12):2299–2311. https://doi.org/10.1007/s11869-022-01252-66
Xu X, Yoneda M (2019) Multitask air-quality prediction based on LSTM-autoencoder model. IEEE Trans Cybern 51(5):2577–2586. https://doi.org/10.1109/TCYB.2019.2945999
Yan R, Liao J, Yang J et al (2021) Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Syst Appl 169(114513):1-15. https://doi.org/10.1016/j.eswa.2020.114513
Zeng Y, Chen J, Jin N et al (2022) Air quality forecasting with hybrid LSTM and extended stationary wavelet transform. Build Environ 213(108822):1–10. https://doi.org/10.1016/j.buildenv.2022.108822
Zhang B, Rong Y, Yong R, Qin D, Li M, Zou G, Pan J (2022a) Deep learning for air pollutant concentration prediction: A review. Atmos Environ 290(119347):1–18. https://doi.org/10.1016/j.atmosenv.2022.119347
Zhang L, Lin J, Qiu R et al (2018) Trend analysis and forecast of PM2. 5 in Fuzhou, China using the ARIMA model. Ecol Indic 95:702–710. https://doi.org/10.1016/j.ecolind.2018.08.032
Zhang X, Xu H, Liang D (2022b) Spatiotemporal variations and connections of single and multiple meteorological factors on PM2.5 concentrations in Xi'an, China. Atmos Environ 275(119015):1–10. https://doi.org/10.1016/j.atmosenv.2022.119015
Zhao J, He F, Ji Z, Ganchev I (2021) PM2.5 Prediction Based on the Combined EMD-LSTM Model. CSCI 2021:193–195. https://doi.org/10.1109/CSCI54926.2021.00104
Zhou Y, Chang FJ, Chang LC et al (2019) Multi-output support vector machine for regional multi-step-ahead PM2. 5 forecasting. Sci Total Environ 651:230–240. https://doi.org/10.1016/j.scitotenv.2018.09.111
Author information
Authors and Affiliations
Contributions
Both authors contributed to the study’s conception and design. Data collection and analysis were performed by Rui Zhang. The first draft of the manuscript was written by Rui Zhang. Norhashidah Awang supervised, reviewed, and edited the manuscript. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, R., Awang, N. An ensemble NLSTM-based model for PM2.5 concentrations prediction considering feature extraction and data decomposition. Air Qual Atmos Health 16, 1969–1987 (2023). https://doi.org/10.1007/s11869-023-01385-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11869-023-01385-2