Skip to main content
Log in

A deep multitask learning approach for air quality prediction

  • S.I.: Data Mining and Decision Analytics
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Air pollution is one of the most serious threats to human health and is an issue causing growing public concern. Air quality forecasts play a fundamental role in providing decision-making support for environmental governance and emergency management, and there is an imperative need for more accurate forecasts. In this paper, we propose a novel spatial–temporal deep multitask learning (ST-DMTL) framework for air quality forecasting based on dynamic spatial panels of multiple data sources. Specifically, we develop a prediction model by combining multitask learning techniques with recurrent neural network (RNN) models and perform empirical analyses to evaluate the utility of each facet of the proposed framework based on a real-world dataset that contains 451,509 air quality records that were generated on an hourly basis from January 2013 to September 2017 in China. An application check is also conducted to verify the practical value of our proposed ST-DMTL framework. Our empirical results indicate the efficacy of the framework as a viable approach for air quality forecasts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Abdul-Wahab, S., Sappurd, A., & Al-Damkhi, A. (2011). Application of California puff (CALPUFF) model: A case study for Oman. Clean Technologies and Environmental Policy, 13(1), 177–189.

    Article  Google Scholar 

  • American Lung Association, State of the Air 2018. https://www.lung.org/assets/documents/healthy-air/state-of-the-air/sota-2018-full.pdf, 2018. Accessed on 21 March 2019.

  • Bai, L., Wang, J., Ma, X., & Lu, H. (2018). Air pollution forecasts: An overview. International Journal of Environmental Research and Public Health, 15(4), 780.

    Article  Google Scholar 

  • Baxter, J. (1997). A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28(1), 7–39.

    Article  Google Scholar 

  • Beevers, S. D., Kitwiroon, N., Williams, M. L., & Carslaw, D. C. (2012). One way coupling of CMAQ and a road source dispersion model for fine scale air pollution predictions. Atmospheric Environment, 59, 47–58.

    Article  Google Scholar 

  • Byun, D., & Schere, K. L. (2006). Review of the governing equations, computational algorithms, and other components of the models-3 Community Multiscale Air Quality (CMAQ) modeling system.

  • Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.

    Article  Google Scholar 

  • Chuang, M. T., Zhang, Y., & Kang, D. (2011). Application of WRF/Chem-MADRID for real-time air quality forecasting over the Southeastern United States. Atmospheric Environment, 45(34), 6241–6250.

    Article  Google Scholar 

  • Ferretti, V., & Montibeller, G. (2016). Key challenges and meta-choices in designing and applying multi-criteria spatial decision support systems. Decision Support Systems, 84, 41–52.

    Article  Google Scholar 

  • Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189-1232.

  • Graves, A. (2013). Generating sequences with recurrent neural networks, arXiv preprint arXiv:1308.0850.

  • Grell, G. A., Peckham, S. E., Schmitz, R., McKeen, S. A., Frost, G., Skamarock, W. C., et al. (2005). Fully coupled “online” chemistry within the WRF model. Atmospheric Environment, 39(37), 6957–6975.

    Article  Google Scholar 

  • Grolinger, K., L’Heureux, A., Capretz, M. A., & Seewald, L. (2016). Energy forecasting for event venues: Big data and prediction accuracy. Energy and Buildings, 112, 222–233.

    Article  Google Scholar 

  • He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2980–2988). https://doi.org/10.1109/ICCV.2017.322.

  • Huang, X., Qi, J., Sun, Y., & Zhang, R. (2020). MALA: Cross-domain dialogue generation with action learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20). arXiv preprint arXiv:1912.08442.

  • Huebnerova, Z., & Michalek, J. (2014). Analysis of daily average PM10 predictions by generalized linear models in Brno. Czech Republic. Atmospheric Pollution Research, 5(3), 471–476.

    Article  Google Scholar 

  • Jin, S. W., Li, Y. P., & Nie, S. (2018). An integrated bi-level optimization model for air quality management of Beijing’s energy system under uncertainty. Journal of Hazardous Materials, 350, 27–37.

    Article  Google Scholar 

  • Khazaei, J., Downward, A., & Zakeri, G. (2014). Modelling counter-intuitive effects on cost and air pollution from intermittent generation. Annals of Operations Research, 222(1), 389–418.

    Article  Google Scholar 

  • Kocheturov, A., Pardalos, P. M., & Karakitsiou, A. (2019). Massive datasets and machine learning for computational biomedicine: Trends and challenges. Annals of Operations Research, 276(1–2), 5–34.

    Article  Google Scholar 

  • Kraus, M., & Feuerriegel, S. (2017). Decision support from financial disclosures with deep neural networks and transfer learning. Decision Support Systems, 104, 38–48.

    Article  Google Scholar 

  • Kumar, A., & Goyal, P. (2011). Forecasting of air quality in Delhi using principal component regression technique. Atmospheric Pollution Research, 2(4), 436–444.

    Article  Google Scholar 

  • Kumar, A., Singh, J. P., Dwivedi, Y. K., & Rana, N. P. (2020). A deep multi-modal neural network for informative Twitter content classification during emergencies. Annals of Operations Research, 1-32.

  • Lin, K. P., Pai, P. F., & Yang, S. L. (2011). Forecasting concentrations of air pollutants by logarithm support vector regression with immune algorithms. Applied Mathematics and Computation, 217(12), 5318–5327.

    Article  Google Scholar 

  • Liu, M., Bi, J., & Ma, Z. (2017). Visibility-based PM2. 5 concentrations in China: 1957-1964 and 1973-2014. Environmental Science &Technology, 51(22), 13161-13169.

  • Ma, Z., Hu, X., Huang, L., Bi, J., & Liu, Y. (2014). Estimating ground-level PM2.5 in China using satellite remote sensing. Environmental Science & Technology, 48(13), 7436-7444.

  • Madrigano, J., Kloog, I., Goldberg, R., Coull, B. A., Mittleman, M. A., & Schwartz, J. (2013). Long-term exposure to PM2.5 and incidence of acute myocardial infarction. Environmental Health Perspectives, 121(2), 192-196.

  • Meissner, M., Schmuker, M., & Schneider, G. (2006). Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training. BMC Bioinformatics, 7(1), 125.

    Article  Google Scholar 

  • Qi, Z., Wang, T., Song, G., Hu, W., Li, X., & Zhang, Z. (2018). Deep air learning: Interpolation, prediction, and feature analysis of fine-grained air quality. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2285–2297.

    Article  Google Scholar 

  • Rahman, N. H. A., Lee, M. H., & Latif, M. T. (2015). Artificial neural networks and fuzzy time series forecasting: An application to air quality. Quality & Quantity, 49(6), 2633–2647.

    Article  Google Scholar 

  • Ruder, S. (2017). An Overview of Multi-Task Learning in Deep Neural Networks. ArXiv Preprint ArXiv:1706.05098, (May). https://doi.org/10.1109/CVPR.2015.7299170.

  • Salehi, B., Liu, F., Baldwin, T., & Wong, W. (2018). Multitask learning for query segmentation in job search. The 2018 ACM SIGIR International Conference. ACM.

  • Şaylı, M., & Yılmaz, E. (2017). Anti-periodic solutions for state-dependent impulsive recurrent neural networks with time-varying and continuously distributed delays. Annals of Operations Research, 258(1), 159–185.

    Article  Google Scholar 

  • Schwartz, J. (1993). Particulate air pollution and chronic respiratory disease. Environmental Research, 62(1), 7–13.

    Article  Google Scholar 

  • Sermpinis, G., Karathanasopoulos, A., Rosillo, R., & de la Fuente, D. (2019). Neural networks in financial trading. Annals of Operations Research, 1-16.

  • Singer, & Jolly, H., R., Gradient flow in recurrent nets: The difficulty of learning long-term dependencies, Wiley-IEEE Press, New Jersey, 2001.

  • Soh, P. W., Chang, J. W., & Huang, J. W. (2018). Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access, 6, 38186–38199.

    Article  Google Scholar 

  • Song, Y., Qin, S., Qu, J., & Liu, F. (2015). The forecasting research of early warning systems for atmospheric pollutants: A case in Yangtze River Delta region. Atmospheric Environment, 118, 58–69.

    Article  Google Scholar 

  • Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., & Vlahavas, I. (2016). Multi-target regression via input space expansion: Treating targets as inputs. Machine Learning, 104(1), 55–98.

    Article  Google Scholar 

  • Taheri Shahraiyni, H., & Sodoudi, S. (2016). Statistical modeling approaches for PM10 prediction in urban areas; A review of 21st-century studies. Atmosphere, 7(2), 15.

    Article  Google Scholar 

  • Tan, Z., Mak, M. W., & Mak, B. K. W. (2018). DNN-based score calibration with multitask learning for noise robust speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(4), 700–712.

    Article  Google Scholar 

  • Tartakovsky, D., Broday, D. M., & Stern, E. (2013). Evaluation of AERMOD and CALPUFF for predicting ambient concentrations of total suspended particulate matter (TSP) emissions from a quarry in complex terrain. Environmental Pollution, 179, 138–145.

    Article  Google Scholar 

  • Titus, J. G. (1990). Greenhouse effect, sea level rise, and barrier islands: Case study of Long Beach Island, New Jersey.

  • Van Donkelaar, A., Martin, R. V., & Park, R. J. (2006). Estimating ground‐level PM2. 5 using aerosol optical depth determined from satellite remote sensing. Journal of Geophysical Research: Atmospheres, 111(D21).

  • Wang, P., Liu, Y., Qin, Z., & Zhang, G. (2015). A novel hybrid forecasting model for PM10 and SO2 daily concentrations. Science of the Total Environment, 505, 1202–1212.

    Article  Google Scholar 

  • Wang, J., & Song, G. (2018). A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing, 314, 198–206.

    Article  Google Scholar 

  • Wang, Y., & Xu, W. (2018). Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decision Support Systems, 105, 87–95.

    Article  Google Scholar 

  • Wang, Q., Xu, W., & Zheng, H. (2018). Combining the wisdom of crowds and technical analysis for financial market prediction using deep random subspace ensembles. Neurocomputing, 299, 51–61.

    Article  Google Scholar 

  • Wang, J., Zhang, X., Guo, Z., & Lu, H. (2017). Developing an early-warning system for air quality prediction and assessment of cities in China. Expert Systems with Applications, 84, 102–116.

    Article  Google Scholar 

  • Werner, M., Kryza, M., Ojrzynska, H., Skjoth, C. A., Walaszek, K., & Dore, A. J. (2015). Application of WRF-Chem to forecasting PM10 concentration over Poland. International Journal of Environment and Pollution, 58(4), 280–292.

    Article  Google Scholar 

  • World Health Organization, Declaration of the Sixth Ministerial Conference on Environment and Health. http://www.euro.who.int/en/media-centre/events/events/2017/06/sixth-ministerial-conference-on-environment-and-health/documentation/declaration-of-the-sixth-ministerial-conference-on-environment-and-health Copenhagen, 2017. Accessed on 21 March 2019.

  • Xu, B., Lin, H., Chiu, L., Hu, Y., Zhu, J., Hu, M., et al. (2011). Collaborative virtual geographic environments: A case study of air pollution simulation. Information Sciences, 181(11), 2231–2246.

    Article  Google Scholar 

  • Xu, W., Wang, Q., & Chen, R. (2018). Spatio-temporal prediction of crop disease severity for agricultural emergency management based on recurrent neural networks. GeoInformatica, 22(2), 363–381.

    Article  Google Scholar 

  • Xu, Y., Yang, W., & Wang, J. (2017). Air quality early-warning system for cities in China. Atmospheric Environment, 148, 239–257.

    Article  Google Scholar 

  • Yang, Z., Chen, V. C., Chang, M. E., Murphy, T. E., & Tsai, J. C. (2007). Mining and modeling for a metropolitan Atlanta ozone pollution decision-making framework. IIE Transactions, 39(6), 607–615.

    Article  Google Scholar 

  • Yang, Z., Chen, V. C., Chang, M. E., Sattler, M. L., & Wen, A. (2009). A decision-making framework for ozone pollution control. Operations Research, 57(2), 484–498.

    Article  Google Scholar 

  • Yang, C. S., Wei, C. P., Yuan, C. C., & Schoung, J. Y. (2010). Predicting the length of hospital stay of burn patients: Comparisons of prediction accuracy among different clinical stages. Decision Support Systems, 50(1), 325–335.

    Article  Google Scholar 

  • Yu, R., Yang, Y., Yang, L., Han, G., & Move, O. A. (2016). RAQ–A random forest approach for predicting air quality in urban sensing systems. Sensors, 16(1), 86.

    Article  Google Scholar 

  • Zheng, Y., Yi, X., Li, M., Li, R., Shan, Z., Chang, E., & Li, T. (2015, August). Forecasting fine-grained air quality based on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2267-2276).

  • Zhu, J., Huang, C., Yang, M., & Fung, G. P. C. (2019). Context-based prediction for road traffic state using trajectory pattern mining and recurrent convolutional neural networks. Information Sciences, 473, 190–201.

    Article  Google Scholar 

  • Zhu, S., Lian, X., Liu, H., Hu, J., Wang, Y., & Che, J. (2017). Daily air quality index forecasting with hybrid models: A case in China. Environmental Pollution, 231, 1232–1244.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant No. 71771212, U1711262), and Fundamental Research Funds for the Central Universities and Research Funds of Renmin University of China (No. 15XNLQ08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Backpropagation through time (BPTT)

The basic equations of the LSTM applied in our paper are as follows (first introduced in Sect. 4.2):

$$ \begin{aligned} & Forecast_{t} = f_{F} \left( {H_{t} } \right) = W_{F} H_{t} + b_{F} \\ & H_{t} = G_{o, t} \odot \tanh C_{t} \\ & C_{t} = G_{f,t} \odot C_{t - 1} + G_{i,t} \odot \tilde{C}_{t} \\ & \tilde{C}_{t} = \tanh \left( {M_{C} X_{t} + N_{C} H_{t - 1} } \right) \\ \end{aligned} $$

The formulas for the three gates of input, forget and output are as follows:

$$ \begin{aligned} & G_{i, t} = \sigma \left( {M_{i} X_{t} + N_{i} H_{t - 1} + P_{i} C_{t - 1} } \right) \\ & G_{f, t} = \sigma \left( {M_{f} X_{t} + N_{f} H_{t - 1} + P_{f} C_{t - 1} } \right) \\ & G_{o, t} = \sigma \left( {M_{o} X_{t} + N_{o} H_{t - 1} + P_{o} C_{t} } \right) \\ \end{aligned} $$

The loss is defined as MSE, given by:

$$ L_{{MSE}} = Forecast_{t} - TrueValue_{{t2}}^{2} $$

Each training sample contains observations acquired during the time window considered in the temporal analysis, which can be regarded as a sequence whose length is \( T \). As a result, the total error of the sample at time \( t \) is the accumulated error from the observation at \( {\text{t}} - T \) to that at \( T \).

We calculate the gradient of our loss function with respect to each parameter in the model. Similar to the summation operation in loss, the gradient is summed at each time step from \( {\text{t}} - T \) to \( T \) for each training example.

The chain rule of differentiation is used in the gradient calculation process. We first take the partial derivative with respect to \( Forecast_{t} \) and obtain:

$$ \delta Forecast_{t} = \frac{{\partial L_{MSE} }}{{\partial Forecast_{t} }} = 2\left( {Forecast_{t} - TrueValue_{t} } \right) $$

Then, we take the partial derivatives with respect to \( W_{F} \),\( H_{t} \),and \( b_{F} \):

$$ \begin{aligned} & \delta W_{F} = \frac{{\partial L_{MSE} }}{{\partial W_{F} }} = \frac{{\partial L_{MSE} }}{{\partial Forecast_{t} }}\frac{{\partial Forecast_{t} }}{{\partial W_{F} }} = \delta Forecast_{t} H_{t} \\ & \delta b_{F} = \frac{{\partial L_{MSE} }}{{\partial b_{F} }} = \frac{{\partial L_{MSE} }}{{\partial Forecast_{t} }}\frac{{\partial Forecast_{t} }}{{\partial b_{F} }} = \delta Forecast_{t} \\ & \delta H_{t} = \frac{{\partial L_{MSE} }}{{\partial H_{t} }} = \frac{{\partial L_{MSE} }}{{\partial Forecast_{t} }}\frac{{\partial Forecast_{t} }}{{\partial H_{t} }} = \delta Forecast_{t} W_{F} \\ \end{aligned} $$

As \( H_{t} \) depends on \( G_{o, t} \) and \( C_{t} \), we apply the chain rule again and obtain

$$ \begin{aligned} & \delta G_{o, t} = \frac{{\partial L_{MSE} }}{{\partial G_{o, t} }} = \frac{{\partial L_{MSE} }}{{\partial H_{t} }}\frac{{\partial H_{t} }}{{\partial G_{o, t} }} = \delta H_{t} \tanh C_{t} \\ & \delta C_{t} = \frac{{\partial L_{MSE} }}{{\partial C_{t} }} = \frac{{\partial L_{MSE} }}{{\partial H_{t} }}\frac{{\partial H_{t} }}{{\partial G_{o, t} }} = \delta H_{t} G_{o, t} \left( {1 - \tanh^{2} \left( {C_{t} } \right)} \right) \\ \end{aligned} $$

Additionally, \( G_{o, t} \) depends on \( M_{o} \),\( N_{o} \) and \( P_{o} \), so we obtain \( \delta M_{o} \),\( \delta N_{o} \) and \( \delta P_{o} \) from the following:

$$ \begin{aligned} & \delta M_{o} = \frac{{\partial L_{MSE} }}{{\partial M_{o} }} = \frac{{\partial L_{MSE} }}{{\partial G_{o, t} }}\frac{{\partial G_{o, t} }}{{\partial M_{o} }} = \delta G_{o, t} \left( {1 - G_{o, t} } \right)G_{o, t} X_{t} \\ & \delta N_{o} = \frac{{\partial L_{MSE} }}{{\partial N_{o} }} = \frac{{\partial L_{MSE} }}{{\partial G_{o, t} }}\frac{{\partial G_{o, t} }}{{\partial N_{o} }} = \delta G_{o, t} \left( {1 - G_{o, t} } \right)G_{o, t} H_{t - 1} \\ & \delta P_{o} = \frac{{\partial L_{MSE} }}{{\partial P_{o} }} = \frac{{\partial L_{MSE} }}{{\partial G_{o, t} }}\frac{{\partial G_{o, t} }}{{\partial P_{o} }} = \delta G_{o, t} \left( {1 - G_{o, t} } \right)G_{o, t} C_{t} \\ \end{aligned} $$

For \( C_{t} \), partial derivatives with respect to \( G_{i, t} \), \( G_{f, t} \) and \( \tilde{C}_{t} \) are needed:

$$ \begin{aligned} & \delta G_{i, t} = \frac{{\partial L_{MSE} }}{{\partial G_{i, t} }} = \frac{{\partial L_{MSE} }}{{\partial C_{t} }}\frac{{\partial C_{t} }}{{\partial G_{i, t} }} = \delta C_{t} C_{t - 1} \\ & \delta G_{f, t} = \frac{{\partial L_{MSE} }}{{\partial G_{f, t} }} = \frac{{\partial L_{MSE} }}{{\partial C_{t} }}\frac{{\partial C_{t} }}{{\partial G_{f, t} }} = \delta C_{t} \tilde{C}_{t} \\ & \delta \tilde{C}_{t} = \frac{{\partial L_{MSE} }}{{\partial \tilde{C}_{t} }} = \frac{{\partial L_{MSE} }}{{\partial C_{t} }}\frac{{\partial C_{t} }}{{\partial \tilde{C}_{t} }} = \delta C_{t} G_{i, t} \\ \end{aligned} $$

Then, the gradients of the loss with respect to parameters \( M \), \( N \) and \( P \) can be obtained:

$$ \begin{aligned} & \delta M_{S} = \frac{{\partial L_{MSE} }}{{\partial M_{S} }} = \frac{{\partial L_{MSE} }}{{\partial G_{S, t} }}\frac{{\partial G_{S, t} }}{{\partial M_{S} }} = \delta G_{S, t} \left( {1 - G_{S, t} } \right)G_{S, t} X_{t} \\ & \delta N_{S} = \frac{{\partial L_{MSE} }}{{\partial N_{S} }} = \frac{{\partial L_{MSE} }}{{\partial G_{S, t} }}\frac{{\partial G_{S, t} }}{{\partial N_{S} }} = \delta G_{S, t} \left( {1 - G_{S, t} } \right)G_{S, t} H_{t - 1} \\ & \delta P_{S} = \frac{{\partial L_{MSE} }}{{\partial P_{S} }} = \frac{{\partial L_{MSE} }}{{\partial G_{S, t} }}\frac{{\partial G_{S, t} }}{{\partial P_{S} }} = \delta G_{S, t} \left( {1 - G_{S, t} } \right)G_{S, t} C_{t} \\ \end{aligned} $$

where \( S \) denotes which type of gate the parameter effects: \( S \in \left\{ {i, f} \right\} \).Finally, the partial differential equations for \( M_{C} \) and \( N_{C} \) are as follows:

$$ \begin{aligned} & \delta M_{C} = \frac{{\partial L_{MSE} }}{{\partial M_{C} }} = \frac{{\partial L_{MSE} }}{{\partial \tilde{C}_{t} }}\frac{{\partial \tilde{C}_{t} }}{{\partial M_{C} }} = \delta \tilde{C}_{t} \left( {1 - \tilde{C}_{t}^{2} } \right)X_{t} \\ & \delta N_{C} = \frac{{\partial L_{MSE} }}{{\partial N_{C} }} = \frac{{\partial L_{MSE} }}{{\partial \tilde{C}_{t} }}\frac{{\partial \tilde{C}_{t} }}{{\partial N_{C} }} = \delta \tilde{C}_{t} \left( {1 - \tilde{C}_{t}^{2} } \right)H_{t - 1} \\ \end{aligned} $$

At this point, the gradients of the loss with respect to all parameters (\( M_{i} \), \( N_{i} \), \( P_{i} \), \( M_{f} \),\( N_{f} \),\( P_{f} \),\( M_{o} N_{o} \),\( P_{o} \) and \( M_{C} \),\( N_{C} \),\( W_{F} \),\( b_{F} \)) have been obtained, and mini-batch gradient descent (MBGD) is used to learn the parameters (according to formula (22) in Sect. 4.2). Notably, our gradients depend only on the current values of terms on the right-hand side in the equations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, X., Xu, W., Jiang, H. et al. A deep multitask learning approach for air quality prediction. Ann Oper Res 303, 51–79 (2021). https://doi.org/10.1007/s10479-020-03734-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-020-03734-1

Keywords

Navigation