Abstract
As advance of economy and industry, the impact of air pollution has gradually gained attention. In order to predict air quality, there were many studies that exploited various machine learning techniques to build predictive model for pollutant concentration or air quality prediction. However, enhancing the prediction performance always is the common problem of existing studies. Traditional templates based on machine learning and deep learning methods, such as GBTR (gradient boosted tree regression), SVR (support vector machine-based regression), and LSTM (long short-term memory), are most promising approaches to address these problems. Some previous researches showed that ensemble learning technology can improve predictive performance of other domains. In order to improve the accuracy of forecasting, in this paper, we propose a hybrid model and framework to improve the forecasting accuracy of air pollution. We not only exploit stacking-based ensemble learning scheme with Pearson correlation coefficient to calculate the correlation between different machine learning models to integrate various forecasting models together, but also construct a framework based on Spark+Hadoop machine learning and TensorFlow deep learning framework to physically integrate these models to demonstrate the next 1 to 8 h’ air pollution forecasting. We also conduct experiments and compare the result with GBTR, SVR, LSTM, and LSTM2 (version 2) models to demonstrate the proposed hybrid model’s predictive performance. The experimental results show that the hybrid model is superior to the existing models used for predicting air pollution.
Similar content being viewed by others
References
Akima H (1970) A new method of interpolation and smooth curve fitting based on local procedures. J ACM 17(4):589–602
Bai L, Wang J, Ma X, Lu H (2018) Air pollution forecasts: an overview. Int J Environ Res Public Health 15(4):780. https://doi.org/10.3390/ijerph15040780
Behera RN, Roy MD (2016) Ensemble based hybrid machine learning approach for sentiment classification-a review. Int J Comput Appl 146(6):31–36. https://doi.org/10.5120/ijca2016910813
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350
Chang YW Hsieh CJ Chang KW Ringgaard M, Lin C, Chih-Jen J (2010) Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11, 1471–1490, 2010. [online] Available at: http://www.jmlr.org/papers/volume11/chang10a/chang10a.pdf [Accessed 26 May 2019]
Chang, Y.-S., Lin, K.-M., Tsai, Y.-T., Zeng, Y.-Z. and Hung, C (2018) Big data platform for air quality analysis and prediction. In: 2018 27th Wireless and Optical Communication Conference (WOCC). IEEE Xplore,1–3. https://doi.org/10.1109/WOCC.2018.8372743
Chang Y-S, Chiao H-T, Abimannan S, Huang Y-P, Tsai Y-T, Lin K-M (2020) An LSTM-based aggregated model for air pollution forecasting. Atmos Pollut Res 11(8):1451–1463. https://doi.org/10.1016/j.apr.2020.05.015
Chen L, Huang H, Wu C, Tsai Y and Chang Y-S (2018) LoRa-based air quality monitor on unmanned aerial vehicle for smart city. In: 2018 International Conference on System Science and Engineering (ICSSE). IEEE Xplore, pp 1–5. https://doi.org/10.1109/ICSSE.2018.8519967
Cho K, Lee B, Kwon M, Kim S (2019) Air quality prediction using a deep neural network model. J Korean Soc Atmos Environ 35(2):214–225. https://doi.org/10.5572/KOSAE.2019.35.2.214
Corani G (2005) Air quality prediction in Milan: feed-forward neural networks, pruned neural networks and lazy learning. Ecol Model 185(2–4):513–529. https://doi.org/10.1016/j.ecolmodel.2005.01.008
Cortes, C. Vapnik, V (1995) Support-vector networks. Mach Learn, 20(3), 273–297. https://doi.org/10.1023/A:1022627411411
Delavar MR, Gholami A, Shiran GR, Rashidi Y, Nakhaeizadeh GR, Fedra K, Afshar SH (2019) Novel method for improving air pollution prediction based on machine learning approaches: a case study applied to the capital city of Tehran. Int J Geo-Inf 8(2):89–109. https://doi.org/10.3390/ijgi8020099
Deng F, Ma L, Gao X, Chen J (2019) The MR-CA models for analysis of pollution sources and prediction of PM2.5. IEEE Trans Syst Man Cybernet Syst 49(4):814–820. https://doi.org/10.1109/TSMC.2017.2721100
Elangasinghe M, Singhal N, Dirks K, Salmond J, Samarasinghe S (2014) Complex time series analysis of PM10 and PM2.5 for a coastal site using artificial neural network modelling and k-means clustering. Atmos Environ 94:106–116. https://doi.org/10.1016/j.atmosenv.2014.04.051
Fan J, Li S, Fan C, Bai Z, Yang K (2016) The impact of PM2.5 on asthma emergency department visits: a systematic review and meta-analysis. Environ Sci Pollut Res 23:843–885. https://doi.org/10.1007/s11356-015-5321-x
Fielding, R. T. Chapter 5 (2000) Representational State Transfer (REST). Architectural styles and the design of network-based software architectures (Ph.D.). University of California, Irvine, 2000. [online] Available at: https://www.ics.uci.edu/~fielding/pubs/dissertation/ fielding_dissertation.pdf
Franceschi F, Cobo M, Figueredo M (2018) Discovering relationships and forecasting PM10 and PM2.5 concentrations in Bogotá, Colombia, using artificial neural networks, principal component analysis, and k-means clustering. Atmos Pollut Res 9(5):912–922. https://doi.org/10.1016/j.apr.2018.02.006
Freedman DA (2009) Statistical models: theory and practice revised. Cambridge University. ISBN: 978-0-521-74385-3
Friedman JH (2002) Stochastic Gradient Boosting. Comput Stat Data Analysis 38(4):367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
Guo C, Xu Y, Tian Z (2020) Inversion of PM2.5 atmospheric refractivity profile based on AlexNet model from the perspective of electromagnetic wave propagation. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-020-07703-w
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hu X, Waller L, Lyapustin A, Wang Y, Al-Hamdan M, Crosson W, Estes M, Estes S, Quattrochi D, Puttaswamy S, Liu Y (2014) Estimating ground-level PM2.5 concentrations in the Southeastern United States using MAIAC AOD retrievals and a two-stage model. Remote Sens Environ 140:220–232. https://doi.org/10.1016/j.rse.2013.08.032
Hyndman RJ, & Athanasopoulos G (2018) Forecasting: principles and practice, 2nd, OTexts: Melbourne. OTexts.com/fpp2. [accessed on 12th may 2018]
Jiang P, Li C, Li R, Yang H (2018) An innovative hybrid air pollution early-warning system based on pollutants forecasting and Extenics evaluation. Knowl-Based Syst 164:174–192. https://doi.org/10.1016/j.knosys.2018.10.036
Kim HS, Park I, Song CH, Lee K, Yun JW, Kim HK, Jeon M, Lee J (2019) Development of daily PM10 and PM2.5 prediction system using a deep long short-term memory neural network model. Atmos Chem Phys Discuss 19:12935–12951. https://doi.org/10.5194/acp-19-12935-2019
Li, T, Li, X, Wang, L, Ren, Y, Zhang, T, Yu, M (2018) Multi-model ensemble forecast method of PM2.5 concentration based on wavelet neural networks. In: 2018 1st international cognitive cities conference (IC3), Okinawa, Japan ,81–86, 7–9. https://doi.org/10.1109/IC3.2018.00026
Liu H, Duan Z, Chen C (2019) A hybrid framework for forecasting PM2.5 concentrations using multi-step deterministic and probabilistic strategy. Air Qual Atmos Health 12(7):785–795. https://doi.org/10.1007/s11869-019-00695-8
Mahajan S, Liu H-M, Tsai T-C, Chen L-J (2018) Improving the accuracy and efficiency of PM2.5 forecast service using cluster-based hybrid neural network model. IEEE Access 6:19193–19204. https://doi.org/10.1109/ACCESS.2018.2820164
Maharani D, Murfi H (2019) Deep neural network for structured data - a case study of mortality rate prediction caused by air quality. J Phys Conf Ser 1192:012010. https://doi.org/10.1088/1742-6596/1192/1/012010
Mitchell T (1997) Machine learning. Singapore: McGraw-Hill, 1997. ISBN-13: 978–0070428072
Pearson K (1895) Notes on regression and inheritance in the case of two parents. Proc R Soc Lond 58(347- 352):240–242. https://doi.org/10.1098/rspl.1895.0041
Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3):21–45. https://doi.org/10.1109/MCAS.2006.1688199
Rijal N, Gutta RT, Cao T, Lin J, Bo Q, Zhang J (2018) Ensemble of deep neural networks for estimating particulate matter from images. In: 2018 IEEE 3rd international conference on image, Vision and Computing (ICIVC), 733-738, 27–29. https://doi.org/10.1109/ICIVC.2018.8492790
Rybarczyk Y, Zalakeviciute R (2018) Machine learning approaches for outdoor air quality modelling: a systematic review. Appl Sci 8(12):2570. https://doi.org/10.3390/app8122570
Seal HL (1967) Studies in the history of probability and statistics. XV: the historical development of the Gauss linear model. Biometrika 54(1–2):1–24. https://doi.org/10.2307/2333849
Shang Z, He J (2018) Predicting hourly PM2.5 concentrations based on random forest and ensemble neural network. In: 2018 Chinese Automation Congress (CAC). pp 234–2345. https://doi.org/10.1109/CAC.2018.8623175
Siwek K Osowski S. Sowinski M (2010) Neural predictor ensemble for accurate forecasting of PM10 pollution. In: The 2010 International joint conference on neural networks (IJCNN), 1-7. https://doi.org/10.1109/IJCNN.2010.5596900
Smola A, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
Soh P, Chang J, Huang J (2018) Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access 6:38186–38199. https://doi.org/10.1109/ACCESS.2018.2849820
Steele JM (2004) The Cauchy–Schwarz master class: an introduction to the art of mathematical inequalities, The Mathematical Association of America. ISBN-13 978–0–521-83775-0
Tsai Y, Zeng Y and Chang Y (2018) Air pollution forecasting using RNN with LSTM. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), 1074–1079. https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00178
UN Environment (2019). Air pollution: Africa’s invisible, Silent Killer [online] Available at: https://www.unenvironment.org/fr/node/20803 [Accessed 26 May 2019]
US EPA (2019). Particulate matter (PM) pollution | US EPA. [online] available at: https ://www.epa.gov/pm-pollution [Accessed 26 May 2019]
Usmani M Ebrahim M Adil SH Raza K (2018) Predicting market performance with hybrid model. In: 2018 3rd international conference on emerging trends in engineering, sciences and technology (ICEEST), 1-4. https://doi.org/10.1109/ICEEST.2018.8643327
Ventura L, de Oliveira Pinto F, Soares L, Luna A, Gioda A (2019) Forecast of daily PM2.5 concentrations applying artificial neural networks and Holt–Winters models. Air Qual Atmos Health 12(3):317–325. https://doi.org/10.1007/s11869-018-00660-x
Verma I Ahuja R Meisheri H, Dey L (2018) Air pollutant severity prediction using Bi-directional LSTM Network. In: 2018 IEEE/WIC/ACM international conference on web intelligence (WI), 651-654. https://doi.org/10.1109/WI.2018.00-19
Wang J, Song GA (2018) Deep spatial-temporal ensemble model for air quality prediction. Neurocomputing 314:198–206. https://doi.org/10.1016/j.neucom.2018.06.049
Who.int (2019) How air pollution is destroying our health. [online] Available at: htps://www.who.int/air-pollution/news-and-events/how-air-pollution-is-destroying-our-health [Accessed 26 May 2019]
Yang B, Guo J, Xiao C (2018) Effect of PM2.5 environmental pollution on rat lung. Environ Sci Pollut Res 25:36136–36146. https://doi.org/10.1007/s11356-018-3492-y
Yi X (2018) Deep distributed fusion network for air quality prediction. In: 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. [online] London, United Kingdom: ACM New York, 965–973. https://doi.org/10.1145/3219819.3219822
Zhang X, Rui X Xia X Bai X Yin W Dong T (2015) A hybrid model for short-term air pollutant concentration forecasting. In:2015 IEEE International Conference on Service Operations and Logistics, And Informatics (SOLI), 171–175. https://doi.org/10.1109/SOLI.2015.7367614
Zhang Y, Wang Y, Gao M, Ma Q, Zhao J, Zhang R, Wang Q, Huang L (2019) A predictive data feature exploration-based air quality prediction approach. IEEE Access 7:30732–30743. https://doi.org/10.1109/ACCESS.2019.2897754
Zheng F, Zhong S (2011) Time series forecasting using an ensemble model incorporating ARIMA and ANN based on combined objectives. In: 2011 2nd international conference on artificial intelligence, management science and electronic commerce (AIMSEC), 2671-2674. https://doi.org/10.1109/AIMSEC.2011.6011011
Zhou Z-H. Ensemble learning. In: Li, SZ (eds) Encyclopedia of biometrics, Springer, Berlin. [online] Available at: https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication /springerEBR09.pdf [Accessed 26 May 2019]
Zhou Q, Jiang H, Wang J, Zhou J (2014) A hybrid model for PM2.5 forecasting based on ensemble empirical mode decomposition and a general regression neural network. Sci Total Environ 496:264–274. https://doi.org/10.1016/j.scitotenv.2014.07.051
Funding
This work was partially supported by Ministry of Science and Technology of Taiwan, Republic of China under Grant No. MOST 106-3114-M-305-001-A, MOST 108-2119-M-305-001-A, MOST 109-2119-M-305-001-A, and MOST108-2321-B-027-001-; and by National Taipei University under Grant No. 106-NTPU_A-H&E-143-001, 107-NTPU_A-H&E-143-001, and 108-NTPU_A-H&E-143-001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Marcus Schulz
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chang, YS., Abimannan, S., Chiao, HT. et al. An ensemble learning based hybrid model and framework for air pollution forecasting. Environ Sci Pollut Res 27, 38155–38168 (2020). https://doi.org/10.1007/s11356-020-09855-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-020-09855-1