Abstract
The intensive monitoring of air pollutants has led to the acquisition of vast quantities of data. Traditional quality control methods based on existing knowledge may be inefficient because of our limited understanding regarding the interaction of human activities and stochastic environmental factors. Moreover, traditional methods for outlier detection may be misleading because of the existence of valid outliers and invalid inliers. In this research, artificial neural networks (ANNs) are developed to identify instrument failure based on current and historical observations. Two structures, i.e., multilayer perceptrons and recurrent networks, are trained using 50,000 hourly data points labeled by human reviewers. The most conservative model identified 57.5% of the invalid sulfur compound observations and 44.9% of the invalid nitrogen compound observations. By setting a more liberal threshold, these values increased to 76.0% and 79.7%, respectively. Except for SO2, the ANNs outperformed the traditional methods for data quality control, as demonstrated with a plausibility test, a test of temporal consistency and a residential analysis. Compared with the test of temporal consistency, which was the most effective traditional method studied, the true positive rates of the ANNs were 19.4% to 29.5% higher for all pollutants except SO2, given the same false positive rates. The results indicate the effectiveness of ANNs for data quality control even without supplementary information. Methods for performance improvement are discussed.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11869-019-00734-4/MediaObjects/11869_2019_734_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11869-019-00734-4/MediaObjects/11869_2019_734_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11869-019-00734-4/MediaObjects/11869_2019_734_Fig3_HTML.png)
Similar content being viewed by others
References
Adams MD, Kanaroglou PS (2016) Mapping real-time air pollution health risk for environmental management: combining mobile and stationary air pollution monitoring with neural network models. J Environ Manag 168:133–141
Apiletti D, Bruno G, Ficarra E, Baralis E (2006) Data cleaning and semantic improvement in biological databases. J Integr Bioinform 3(2):219–229
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Chaloner K, Brant R (1988) A Bayesian approach to outlier detection and residual analysis. Biometrika 75(4):651–659
Di Persio L, Honchar O (2016) Artificial neural networks architectures for stock price prediction: comparisons and applications. Int J Circuits Syst Signal Process 10:403–413
England WL (1988) An exponential model used for optimal threshold selection on ROC Curues. Med Decis Mak 8(2):120–131
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–54
Feng J, Gao S, Fu Q, Chen X, Chen X, Han D, Cheng J (2019) Indirect source apportionment of methyl mercaptan using CMB and PMF models: a case study near a refining and petrochemical plant. Environ Sci Pollut R 26:24305–24312. https://doi.org/10.1007/s11356-019-05728-4
Gandin LS (1988) Complex quality control of meteorological observations. Mon Weather Rev 116(5):1137–1156
Guyon I, Matic N, Vapnik V (1994) Discovering informative patterns and data cleaning. AAAI technical report WS-94-03:45–156
Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123
Hara K, Saito D, Shouno H (2015) Analysis of function of rectified linear unit used in deep learning. 2015 international joint conference on neural networks (IJCNN). https://doi.org/10.1109/IJCNN.2015.7280578
Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In International conference on data warehousing and knowledge discovery. Springer, Berlin, Heidelberg, pp 170–180
Järvi L, Hannuniemi H, Hussein T, Junninen H, Aalto PP, Hillamo R, Mäkelä T, Keronen P, Siivola E, Vesala T, Kulmala M (2009) The urban measurement station SMEAR III: continuous monitoring of air pollution and surface–atmosphere interactions in Helsinki, Finland. Boreal Environ Res 14(suppl. A:86–109
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. The 3rd international conference for learning representations. arXiv:1412.6980
Liu GH, Shen HB, Yu DJ (2016) Prediction of protein–protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membrane Biol 249(1–2):141–153
Malby AR, Whyatt JD, Timmis RJ (2013) Conditional extraction of air-pollutant source signals from air-quality monitoring. Atmos Environ 74:112–122
Mishra RK, Shukla A, Parida M, Pandey G (2016) Urban roadside monitoring and prediction of CO, NO2 and SO2 dispersion from on-road vehicles in megacity Delhi. Transp Res D: Transp Environ 46:157–165
Nunnari G, Dorling S, Schlink U, Cawley G, Foxall R, Chatterton T (2004) Modelling SO2 concentration at a point with statistical approaches. Environ Model Softw 19(10):887–905
Olawoyin R (2016) Application of backpropagation artificial neural network prediction model for the PAH bioremediation of polluted soil. Chemosphere 161:145–150
Pearson RK (2002) Outliers in process modeling and identification. IEEE T Contr Syst T 10(1):55–63
Qiu S, Lin Y, Shang R, Zhang J, Ma L, Zhu Z (2019) Making Landsat time series consistent: evaluating and improving Landsat analysis ready data. Remote Sens 11(1):51
Relvas H, Miranda AI (2018) An urban air quality modeling system to support decision-making: design and implementation. Air Qual Atmos Health 11(7):815–824
Saha B, Srivastava D (2014) Data quality: the other face of big data. International conference on data engineering:1294–1297
Sanchez IE (2017) Optimal threshold estimation for binary classifiers using game theory. F1000Research, 5, ISCB Comm J-2762. https://doi.org/10.12688/f1000research.10114.3
SEPB (Shanghai Environmental Protection Bureau) (2017) Regulations on automatic monitoring construction, networking, Operation and maintenance and management of fixed pollution sources in Shanghai. http://www.sepb.gov.cn/fa/cms/xxgk/AC41/AC4103000/2017/06/96299.htm Accessed 1 November 2018
Solaiman TA, Coulibaly P, Kanaroglou P (2008) Ground-level ozone forecasting using data-driven methods. Air Qual Atmos Health 1(4):179–193
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Machine Learning Res 15(1):1929–1958
Süt N, Şenocak M (2007) Assessment of the performances of multilayer perceptron neural networks in comparison with recurrent neural networks and two statistical methods for diagnosing coronary artery disease. Expert Syst 24(3):131–142
Tong W, Li L, Zhou X, Hamilton A, Zhang K (2019) Deep learning PM2.5 concentrations with bidirectional LSTM RNN. Air Qual Atmos Health 12:411–423. https://doi.org/10.1007/s11869-018-0647-4
Van den Broeck J, Cunningham SA, Eeckels R, Herbst K (2005) Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2(10):e267
Wang C, Zhao L, Sun W, Xue J, Xie Y (2018) Identifying redundant monitoring stations in an air quality monitoring network. Atmos Environ 190:256–268
Zhang C, Woodland PC (2015) Parameterised sigmoid and ReLU hidden activation functions for DNN acoustic modelling. Conference of the international speech communication association: 3224–3228
Zhao J, Deng F, Cai Y, Chen J (2019) Long short-term memory – fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 220:486–492. https://doi.org/10.1016/j.chemosphere.2018.12.128
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39(4):561–577
Funding
This study was funded by National Natural Science Foundation of China (Project 21,577,090 and 21,777,094), National Science and Technology Support Program (Project 2014BAC22B07) and Shanghai Jiao Tong University China Institute for Urban Governance (Project SJTU-2019UGBD-01).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(DOCX 728 kb)
Rights and permissions
About this article
Cite this article
Wang, Z., Feng, J., Fu, Q. et al. Quality control of online monitoring data of air pollutants using artificial neural networks. Air Qual Atmos Health 12, 1189–1196 (2019). https://doi.org/10.1007/s11869-019-00734-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11869-019-00734-4