Skip to main content
Log in

Quality control of online monitoring data of air pollutants using artificial neural networks

  • Published:
Air Quality, Atmosphere & Health Aims and scope Submit manuscript

Abstract

The intensive monitoring of air pollutants has led to the acquisition of vast quantities of data. Traditional quality control methods based on existing knowledge may be inefficient because of our limited understanding regarding the interaction of human activities and stochastic environmental factors. Moreover, traditional methods for outlier detection may be misleading because of the existence of valid outliers and invalid inliers. In this research, artificial neural networks (ANNs) are developed to identify instrument failure based on current and historical observations. Two structures, i.e., multilayer perceptrons and recurrent networks, are trained using 50,000 hourly data points labeled by human reviewers. The most conservative model identified 57.5% of the invalid sulfur compound observations and 44.9% of the invalid nitrogen compound observations. By setting a more liberal threshold, these values increased to 76.0% and 79.7%, respectively. Except for SO2, the ANNs outperformed the traditional methods for data quality control, as demonstrated with a plausibility test, a test of temporal consistency and a residential analysis. Compared with the test of temporal consistency, which was the most effective traditional method studied, the true positive rates of the ANNs were 19.4% to 29.5% higher for all pollutants except SO2, given the same false positive rates. The results indicate the effectiveness of ANNs for data quality control even without supplementary information. Methods for performance improvement are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adams MD, Kanaroglou PS (2016) Mapping real-time air pollution health risk for environmental management: combining mobile and stationary air pollution monitoring with neural network models. J Environ Manag 168:133–141

    Article  CAS  Google Scholar 

  • Apiletti D, Bruno G, Ficarra E, Baralis E (2006) Data cleaning and semantic improvement in biological databases. J Integr Bioinform 3(2):219–229

    Article  Google Scholar 

  • Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159

    Article  Google Scholar 

  • Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259

    Article  Google Scholar 

  • Chaloner K, Brant R (1988) A Bayesian approach to outlier detection and residual analysis. Biometrika 75(4):651–659

    Article  Google Scholar 

  • Di Persio L, Honchar O (2016) Artificial neural networks architectures for stock price prediction: comparisons and applications. Int J Circuits Syst Signal Process 10:403–413

    Google Scholar 

  • England WL (1988) An exponential model used for optimal threshold selection on ROC Curues. Med Decis Mak 8(2):120–131

    Article  CAS  Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

    Article  Google Scholar 

  • Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–54

    Google Scholar 

  • Feng J, Gao S, Fu Q, Chen X, Chen X, Han D, Cheng J (2019) Indirect source apportionment of methyl mercaptan using CMB and PMF models: a case study near a refining and petrochemical plant. Environ Sci Pollut R 26:24305–24312. https://doi.org/10.1007/s11356-019-05728-4

    Article  CAS  Google Scholar 

  • Gandin LS (1988) Complex quality control of meteorological observations. Mon Weather Rev 116(5):1137–1156

    Article  Google Scholar 

  • Guyon I, Matic N, Vapnik V (1994) Discovering informative patterns and data cleaning. AAAI technical report WS-94-03:45–156

  • Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123

    Article  Google Scholar 

  • Hara K, Saito D, Shouno H (2015) Analysis of function of rectified linear unit used in deep learning. 2015 international joint conference on neural networks (IJCNN). https://doi.org/10.1109/IJCNN.2015.7280578

  • Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In International conference on data warehousing and knowledge discovery. Springer, Berlin, Heidelberg, pp 170–180

  • Järvi L, Hannuniemi H, Hussein T, Junninen H, Aalto PP, Hillamo R, Mäkelä T, Keronen P, Siivola E, Vesala T, Kulmala M (2009) The urban measurement station SMEAR III: continuous monitoring of air pollution and surface–atmosphere interactions in Helsinki, Finland. Boreal Environ Res 14(suppl. A:86–109

    Google Scholar 

  • Kingma D, Ba J (2015) Adam: a method for stochastic optimization. The 3rd international conference for learning representations. arXiv:1412.6980

  • Liu GH, Shen HB, Yu DJ (2016) Prediction of protein–protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membrane Biol 249(1–2):141–153

    Article  CAS  Google Scholar 

  • Malby AR, Whyatt JD, Timmis RJ (2013) Conditional extraction of air-pollutant source signals from air-quality monitoring. Atmos Environ 74:112–122

    Article  CAS  Google Scholar 

  • Mishra RK, Shukla A, Parida M, Pandey G (2016) Urban roadside monitoring and prediction of CO, NO2 and SO2 dispersion from on-road vehicles in megacity Delhi. Transp Res D: Transp Environ 46:157–165

    Article  Google Scholar 

  • Nunnari G, Dorling S, Schlink U, Cawley G, Foxall R, Chatterton T (2004) Modelling SO2 concentration at a point with statistical approaches. Environ Model Softw 19(10):887–905

    Article  Google Scholar 

  • Olawoyin R (2016) Application of backpropagation artificial neural network prediction model for the PAH bioremediation of polluted soil. Chemosphere 161:145–150

    Article  CAS  Google Scholar 

  • Pearson RK (2002) Outliers in process modeling and identification. IEEE T Contr Syst T 10(1):55–63

    Article  Google Scholar 

  • Qiu S, Lin Y, Shang R, Zhang J, Ma L, Zhu Z (2019) Making Landsat time series consistent: evaluating and improving Landsat analysis ready data. Remote Sens 11(1):51

    Article  Google Scholar 

  • Relvas H, Miranda AI (2018) An urban air quality modeling system to support decision-making: design and implementation. Air Qual Atmos Health 11(7):815–824

    Article  CAS  Google Scholar 

  • Saha B, Srivastava D (2014) Data quality: the other face of big data. International conference on data engineering:1294–1297

  • Sanchez IE (2017) Optimal threshold estimation for binary classifiers using game theory. F1000Research, 5, ISCB Comm J-2762. https://doi.org/10.12688/f1000research.10114.3

  • SEPB (Shanghai Environmental Protection Bureau) (2017) Regulations on automatic monitoring construction, networking, Operation and maintenance and management of fixed pollution sources in Shanghai. http://www.sepb.gov.cn/fa/cms/xxgk/AC41/AC4103000/2017/06/96299.htm Accessed 1 November 2018

  • Solaiman TA, Coulibaly P, Kanaroglou P (2008) Ground-level ozone forecasting using data-driven methods. Air Qual Atmos Health 1(4):179–193

    Article  CAS  Google Scholar 

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Machine Learning Res 15(1):1929–1958

    Google Scholar 

  • Süt N, Şenocak M (2007) Assessment of the performances of multilayer perceptron neural networks in comparison with recurrent neural networks and two statistical methods for diagnosing coronary artery disease. Expert Syst 24(3):131–142

    Article  Google Scholar 

  • Tong W, Li L, Zhou X, Hamilton A, Zhang K (2019) Deep learning PM2.5 concentrations with bidirectional LSTM RNN. Air Qual Atmos Health 12:411–423. https://doi.org/10.1007/s11869-018-0647-4

    Article  CAS  Google Scholar 

  • Van den Broeck J, Cunningham SA, Eeckels R, Herbst K (2005) Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2(10):e267

    Article  Google Scholar 

  • Wang C, Zhao L, Sun W, Xue J, Xie Y (2018) Identifying redundant monitoring stations in an air quality monitoring network. Atmos Environ 190:256–268

    Article  CAS  Google Scholar 

  • Zhang C, Woodland PC (2015) Parameterised sigmoid and ReLU hidden activation functions for DNN acoustic modelling. Conference of the international speech communication association: 3224–3228

  • Zhao J, Deng F, Cai Y, Chen J (2019) Long short-term memory – fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 220:486–492. https://doi.org/10.1016/j.chemosphere.2018.12.128

    Article  CAS  Google Scholar 

  • Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39(4):561–577

    Article  CAS  Google Scholar 

Download references

Funding

This study was funded by National Natural Science Foundation of China (Project 21,577,090 and 21,777,094), National Science and Technology Support Program (Project 2014BAC22B07) and Shanghai Jiao Tong University China Institute for Urban Governance (Project SJTU-2019UGBD-01).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jingjing Feng or Jinping Cheng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(DOCX 728 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Feng, J., Fu, Q. et al. Quality control of online monitoring data of air pollutants using artificial neural networks. Air Qual Atmos Health 12, 1189–1196 (2019). https://doi.org/10.1007/s11869-019-00734-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11869-019-00734-4

Keywords

Navigation