Abstract
Air pollution is one of the most serious threats to human health and is an issue causing growing public concern. Air quality forecasts play a fundamental role in providing decision-making support for environmental governance and emergency management, and there is an imperative need for more accurate forecasts. In this paper, we propose a novel spatial–temporal deep multitask learning (ST-DMTL) framework for air quality forecasting based on dynamic spatial panels of multiple data sources. Specifically, we develop a prediction model by combining multitask learning techniques with recurrent neural network (RNN) models and perform empirical analyses to evaluate the utility of each facet of the proposed framework based on a real-world dataset that contains 451,509 air quality records that were generated on an hourly basis from January 2013 to September 2017 in China. An application check is also conducted to verify the practical value of our proposed ST-DMTL framework. Our empirical results indicate the efficacy of the framework as a viable approach for air quality forecasts.
Similar content being viewed by others
References
Abdul-Wahab, S., Sappurd, A., & Al-Damkhi, A. (2011). Application of California puff (CALPUFF) model: A case study for Oman. Clean Technologies and Environmental Policy, 13(1), 177–189.
American Lung Association, State of the Air 2018. https://www.lung.org/assets/documents/healthy-air/state-of-the-air/sota-2018-full.pdf, 2018. Accessed on 21 March 2019.
Bai, L., Wang, J., Ma, X., & Lu, H. (2018). Air pollution forecasts: An overview. International Journal of Environmental Research and Public Health, 15(4), 780.
Baxter, J. (1997). A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28(1), 7–39.
Beevers, S. D., Kitwiroon, N., Williams, M. L., & Carslaw, D. C. (2012). One way coupling of CMAQ and a road source dispersion model for fine scale air pollution predictions. Atmospheric Environment, 59, 47–58.
Byun, D., & Schere, K. L. (2006). Review of the governing equations, computational algorithms, and other components of the models-3 Community Multiscale Air Quality (CMAQ) modeling system.
Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.
Chuang, M. T., Zhang, Y., & Kang, D. (2011). Application of WRF/Chem-MADRID for real-time air quality forecasting over the Southeastern United States. Atmospheric Environment, 45(34), 6241–6250.
Ferretti, V., & Montibeller, G. (2016). Key challenges and meta-choices in designing and applying multi-criteria spatial decision support systems. Decision Support Systems, 84, 41–52.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189-1232.
Graves, A. (2013). Generating sequences with recurrent neural networks, arXiv preprint arXiv:1308.0850.
Grell, G. A., Peckham, S. E., Schmitz, R., McKeen, S. A., Frost, G., Skamarock, W. C., et al. (2005). Fully coupled “online” chemistry within the WRF model. Atmospheric Environment, 39(37), 6957–6975.
Grolinger, K., L’Heureux, A., Capretz, M. A., & Seewald, L. (2016). Energy forecasting for event venues: Big data and prediction accuracy. Energy and Buildings, 112, 222–233.
He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2980–2988). https://doi.org/10.1109/ICCV.2017.322.
Huang, X., Qi, J., Sun, Y., & Zhang, R. (2020). MALA: Cross-domain dialogue generation with action learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20). arXiv preprint arXiv:1912.08442.
Huebnerova, Z., & Michalek, J. (2014). Analysis of daily average PM10 predictions by generalized linear models in Brno. Czech Republic. Atmospheric Pollution Research, 5(3), 471–476.
Jin, S. W., Li, Y. P., & Nie, S. (2018). An integrated bi-level optimization model for air quality management of Beijing’s energy system under uncertainty. Journal of Hazardous Materials, 350, 27–37.
Khazaei, J., Downward, A., & Zakeri, G. (2014). Modelling counter-intuitive effects on cost and air pollution from intermittent generation. Annals of Operations Research, 222(1), 389–418.
Kocheturov, A., Pardalos, P. M., & Karakitsiou, A. (2019). Massive datasets and machine learning for computational biomedicine: Trends and challenges. Annals of Operations Research, 276(1–2), 5–34.
Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural networks and transfer learning. Decision Support Systems, 104, 38–48.
Kumar, A., & Goyal, P. (2011). Forecasting of air quality in Delhi using principal component regression technique. Atmospheric Pollution Research, 2(4), 436–444.
Kumar, A., Singh, J. P., Dwivedi, Y. K., & Rana, N. P. (2020). A deep multi-modal neural network for informative Twitter content classification during emergencies. Annals of Operations Research, 1-32.
Lin, K. P., Pai, P. F., & Yang, S. L. (2011). Forecasting concentrations of air pollutants by logarithm support vector regression with immune algorithms. Applied Mathematics and Computation, 217(12), 5318–5327.
Liu, M., Bi, J., & Ma, Z. (2017). Visibility-based PM2. 5 concentrations in China: 1957-1964 and 1973-2014. Environmental Science &Technology, 51(22), 13161-13169.
Ma, Z., Hu, X., Huang, L., Bi, J., & Liu, Y. (2014). Estimating ground-level PM2.5 in China using satellite remote sensing. Environmental Science & Technology, 48(13), 7436-7444.
Madrigano, J., Kloog, I., Goldberg, R., Coull, B. A., Mittleman, M. A., & Schwartz, J. (2013). Long-term exposure to PM2.5 and incidence of acute myocardial infarction. Environmental Health Perspectives, 121(2), 192-196.
Meissner, M., Schmuker, M., & Schneider, G. (2006). Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training. BMC Bioinformatics, 7(1), 125.
Qi, Z., Wang, T., Song, G., Hu, W., Li, X., & Zhang, Z. (2018). Deep air learning: Interpolation, prediction, and feature analysis of fine-grained air quality. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2285–2297.
Rahman, N. H. A., Lee, M. H., & Latif, M. T. (2015). Artificial neural networks and fuzzy time series forecasting: An application to air quality. Quality & Quantity, 49(6), 2633–2647.
Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. ArXiv Preprint ArXiv:1706.05098, (May). https://doi.org/10.1109/CVPR.2015.7299170.
Salehi, B., Liu, F., Baldwin, T., & Wong, W. (2018). Multitask learning for query segmentation in job search. The 2018 ACM SIGIR International Conference. ACM.
Şaylı, M., & Yılmaz, E. (2017). Anti-periodic solutions for state-dependent impulsive recurrent neural networks with time-varying and continuously distributed delays. Annals of Operations Research, 258(1), 159–185.
Schwartz, J. (1993). Particulate air pollution and chronic respiratory disease. Environmental Research, 62(1), 7–13.
Sermpinis, G., Karathanasopoulos, A., Rosillo, R., & de la Fuente, D. (2019). Neural networks in financial trading. Annals of Operations Research, 1-16.
Singer, & Jolly, H., R., Gradient flow in recurrent nets: The difficulty of learning long-term dependencies, Wiley-IEEE Press, New Jersey, 2001.
Soh, P. W., Chang, J. W., & Huang, J. W. (2018). Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access, 6, 38186–38199.
Song, Y., Qin, S., Qu, J., & Liu, F. (2015). The forecasting research of early warning systems for atmospheric pollutants: A case in Yangtze River Delta region. Atmospheric Environment, 118, 58–69.
Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., & Vlahavas, I. (2016). Multi-target regression via input space expansion: Treating targets as inputs. Machine Learning, 104(1), 55–98.
Taheri Shahraiyni, H., & Sodoudi, S. (2016). Statistical modeling approaches for PM10 prediction in urban areas; A review of 21st-century studies. Atmosphere, 7(2), 15.
Tan, Z., Mak, M. W., & Mak, B. K. W. (2018). DNN-based score calibration with multitask learning for noise robust speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(4), 700–712.
Tartakovsky, D., Broday, D. M., & Stern, E. (2013). Evaluation of AERMOD and CALPUFF for predicting ambient concentrations of total suspended particulate matter (TSP) emissions from a quarry in complex terrain. Environmental Pollution, 179, 138–145.
Titus, J. G. (1990). Greenhouse effect, sea level rise, and barrier islands: Case study of Long Beach Island, New Jersey.
Van Donkelaar, A., Martin, R. V., & Park, R. J. (2006). Estimating ground‐level PM2. 5 using aerosol optical depth determined from satellite remote sensing. Journal of Geophysical Research: Atmospheres, 111(D21).
Wang, P., Liu, Y., Qin, Z., & Zhang, G. (2015). A novel hybrid forecasting model for PM10 and SO2 daily concentrations. Science of the Total Environment, 505, 1202–1212.
Wang, J., & Song, G. (2018). A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing, 314, 198–206.
Wang, Y., & Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decision Support Systems, 105, 87–95.
Wang, Q., Xu, W., & Zheng, H. (2018). Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles. Neurocomputing, 299, 51–61.
Wang, J., Zhang, X., Guo, Z., & Lu, H. (2017). Developing an early-warning system for air quality prediction and assessment of cities in China. Expert Systems with Applications, 84, 102–116.
Werner, M., Kryza, M., Ojrzynska, H., Skjoth, C. A., Walaszek, K., & Dore, A. J. (2015). Application of WRF-Chem to forecasting PM10 concentration over Poland. International Journal of Environment and Pollution, 58(4), 280–292.
World Health Organization, Declaration of the Sixth Ministerial Conference on Environment and Health. http://www.euro.who.int/en/media-centre/events/events/2017/06/sixth-ministerial-conference-on-environment-and-health/documentation/declaration-of-the-sixth-ministerial-conference-on-environment-and-health Copenhagen, 2017. Accessed on 21 March 2019.
Xu, B., Lin, H., Chiu, L., Hu, Y., Zhu, J., Hu, M., et al. (2011). Collaborative virtual geographic environments: A case study of air pollution simulation. Information Sciences, 181(11), 2231–2246.
Xu, W., Wang, Q., & Chen, R. (2018). Spatio-temporal prediction of crop disease severity for agricultural emergency management based on recurrent neural networks. GeoInformatica, 22(2), 363–381.
Xu, Y., Yang, W., & Wang, J. (2017). Air quality early-warning system for cities in China. Atmospheric Environment, 148, 239–257.
Yang, Z., Chen, V. C., Chang, M. E., Murphy, T. E., & Tsai, J. C. (2007). Mining and modeling for a metropolitan Atlanta ozone pollution decision-making framework. IIE Transactions, 39(6), 607–615.
Yang, Z., Chen, V. C., Chang, M. E., Sattler, M. L., & Wen, A. (2009). A decision-making framework for ozone pollution control. Operations Research, 57(2), 484–498.
Yang, C. S., Wei, C. P., Yuan, C. C., & Schoung, J. Y. (2010). Predicting the length of hospital stay of burn patients: Comparisons of prediction accuracy among different clinical stages. Decision Support Systems, 50(1), 325–335.
Yu, R., Yang, Y., Yang, L., Han, G., & Move, O. A. (2016). RAQ–A random forest approach for predicting air quality in urban sensing systems. Sensors, 16(1), 86.
Zheng, Y., Yi, X., Li, M., Li, R., Shan, Z., Chang, E., & Li, T. (2015, August). Forecasting fine-grained air quality based on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2267-2276).
Zhu, J., Huang, C., Yang, M., & Fung, G. P. C. (2019). Context-based prediction for road traffic state using trajectory pattern mining and recurrent convolutional neural networks. Information Sciences, 473, 190–201.
Zhu, S., Lian, X., Liu, H., Hu, J., Wang, Y., & Che, J. (2017). Daily air quality index forecasting with hybrid models: A case in China. Environmental Pollution, 231, 1232–1244.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant No. 71771212, U1711262), and Fundamental Research Funds for the Central Universities and Research Funds of Renmin University of China (No. 15XNLQ08).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Backpropagation through time (BPTT)
The basic equations of the LSTM applied in our paper are as follows (first introduced in Sect. 4.2):
The formulas for the three gates of input, forget and output are as follows:
The loss is defined as MSE, given by:
Each training sample contains observations acquired during the time window considered in the temporal analysis, which can be regarded as a sequence whose length is \( T \). As a result, the total error of the sample at time \( t \) is the accumulated error from the observation at \( {\text{t}} - T \) to that at \( T \).
We calculate the gradient of our loss function with respect to each parameter in the model. Similar to the summation operation in loss, the gradient is summed at each time step from \( {\text{t}} - T \) to \( T \) for each training example.
The chain rule of differentiation is used in the gradient calculation process. We first take the partial derivative with respect to \( Forecast_{t} \) and obtain:
Then, we take the partial derivatives with respect to \( W_{F} \),\( H_{t} \),and \( b_{F} \):
As \( H_{t} \) depends on \( G_{o, t} \) and \( C_{t} \), we apply the chain rule again and obtain
Additionally, \( G_{o, t} \) depends on \( M_{o} \),\( N_{o} \) and \( P_{o} \), so we obtain \( \delta M_{o} \),\( \delta N_{o} \) and \( \delta P_{o} \) from the following:
For \( C_{t} \), partial derivatives with respect to \( G_{i, t} \), \( G_{f, t} \) and \( \tilde{C}_{t} \) are needed:
Then, the gradients of the loss with respect to parameters \( M \), \( N \) and \( P \) can be obtained:
where \( S \) denotes which type of gate the parameter effects: \( S \in \left\{ {i, f} \right\} \).Finally, the partial differential equations for \( M_{C} \) and \( N_{C} \) are as follows:
At this point, the gradients of the loss with respect to all parameters (\( M_{i} \), \( N_{i} \), \( P_{i} \), \( M_{f} \),\( N_{f} \),\( P_{f} \),\( M_{o} N_{o} \),\( P_{o} \) and \( M_{C} \),\( N_{C} \),\( W_{F} \),\( b_{F} \)) have been obtained, and mini-batch gradient descent (MBGD) is used to learn the parameters (according to formula (22) in Sect. 4.2). Notably, our gradients depend only on the current values of terms on the right-hand side in the equations.
Rights and permissions
About this article
Cite this article
Sun, X., Xu, W., Jiang, H. et al. A deep multitask learning approach for air quality prediction. Ann Oper Res 303, 51–79 (2021). https://doi.org/10.1007/s10479-020-03734-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-020-03734-1