Quality control of online monitoring data of air pollutants using artificial neural networks

Wang, Ziyu; Feng, Jingjing; Fu, Qingyan; Gao, Song; Chen, Xiaojia; Cheng, Jinping

doi:10.1007/s11869-019-00734-4

Quality control of online monitoring data of air pollutants using artificial neural networks

Published: 07 August 2019

Volume 12, pages 1189–1196, (2019)
Cite this article

Air Quality, Atmosphere & Health Aims and scope Submit manuscript

Ziyu Wang¹,
Jingjing Feng^1,2,
Qingyan Fu³,
Song Gao³,
Xiaojia Chen¹ &
…
Jinping Cheng¹

478 Accesses
17 Citations
Explore all metrics

Abstract

The intensive monitoring of air pollutants has led to the acquisition of vast quantities of data. Traditional quality control methods based on existing knowledge may be inefficient because of our limited understanding regarding the interaction of human activities and stochastic environmental factors. Moreover, traditional methods for outlier detection may be misleading because of the existence of valid outliers and invalid inliers. In this research, artificial neural networks (ANNs) are developed to identify instrument failure based on current and historical observations. Two structures, i.e., multilayer perceptrons and recurrent networks, are trained using 50,000 hourly data points labeled by human reviewers. The most conservative model identified 57.5% of the invalid sulfur compound observations and 44.9% of the invalid nitrogen compound observations. By setting a more liberal threshold, these values increased to 76.0% and 79.7%, respectively. Except for SO₂, the ANNs outperformed the traditional methods for data quality control, as demonstrated with a plausibility test, a test of temporal consistency and a residential analysis. Compared with the test of temporal consistency, which was the most effective traditional method studied, the true positive rates of the ANNs were 19.4% to 29.5% higher for all pollutants except SO₂, given the same false positive rates. The results indicate the effectiveness of ANNs for data quality control even without supplementary information. Methods for performance improvement are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new method for prediction of air pollution based on intelligent computation

Article 28 November 2019

Statistical data pre-processing and time series incorporation for high-efficacy calibration of low-cost NO2 sensor using machine learning

Article Open access 21 April 2024

Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru

Article Open access 20 December 2021

References

Adams MD, Kanaroglou PS (2016) Mapping real-time air pollution health risk for environmental management: combining mobile and stationary air pollution monitoring with neural network models. J Environ Manag 168:133–141
Article CAS Google Scholar
Apiletti D, Bruno G, Ficarra E, Baralis E (2006) Data cleaning and semantic improvement in biological databases. J Integr Bioinform 3(2):219–229
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar
Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249–259
Article Google Scholar
Chaloner K, Brant R (1988) A Bayesian approach to outlier detection and residual analysis. Biometrika 75(4):651–659
Article Google Scholar
Di Persio L, Honchar O (2016) Artificial neural networks architectures for stock price prediction: comparisons and applications. Int J Circuits Syst Signal Process 10:403–413
Google Scholar
England WL (1988) An exponential model used for optimal threshold selection on ROC Curues. Med Decis Mak 8(2):120–131
Article CAS Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Article Google Scholar
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37–54
Google Scholar
Feng J, Gao S, Fu Q, Chen X, Chen X, Han D, Cheng J (2019) Indirect source apportionment of methyl mercaptan using CMB and PMF models: a case study near a refining and petrochemical plant. Environ Sci Pollut R 26:24305–24312. https://doi.org/10.1007/s11356-019-05728-4
Article CAS Google Scholar
Gandin LS (1988) Complex quality control of meteorological observations. Mon Weather Rev 116(5):1137–1156
Article Google Scholar
Guyon I, Matic N, Vapnik V (1994) Discovering informative patterns and data cleaning. AAAI technical report WS-94-03:45–156
Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123
Article Google Scholar
Hara K, Saito D, Shouno H (2015) Analysis of function of rectified linear unit used in deep learning. 2015 international joint conference on neural networks (IJCNN). https://doi.org/10.1109/IJCNN.2015.7280578
Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In International conference on data warehousing and knowledge discovery. Springer, Berlin, Heidelberg, pp 170–180
Järvi L, Hannuniemi H, Hussein T, Junninen H, Aalto PP, Hillamo R, Mäkelä T, Keronen P, Siivola E, Vesala T, Kulmala M (2009) The urban measurement station SMEAR III: continuous monitoring of air pollution and surface–atmosphere interactions in Helsinki, Finland. Boreal Environ Res 14(suppl. A:86–109
Google Scholar
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. The 3rd international conference for learning representations. arXiv:1412.6980
Liu GH, Shen HB, Yu DJ (2016) Prediction of protein–protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membrane Biol 249(1–2):141–153
Article CAS Google Scholar
Malby AR, Whyatt JD, Timmis RJ (2013) Conditional extraction of air-pollutant source signals from air-quality monitoring. Atmos Environ 74:112–122
Article CAS Google Scholar
Mishra RK, Shukla A, Parida M, Pandey G (2016) Urban roadside monitoring and prediction of CO, NO₂ and SO₂ dispersion from on-road vehicles in megacity Delhi. Transp Res D: Transp Environ 46:157–165
Article Google Scholar
Nunnari G, Dorling S, Schlink U, Cawley G, Foxall R, Chatterton T (2004) Modelling SO₂ concentration at a point with statistical approaches. Environ Model Softw 19(10):887–905
Article Google Scholar
Olawoyin R (2016) Application of backpropagation artificial neural network prediction model for the PAH bioremediation of polluted soil. Chemosphere 161:145–150
Article CAS Google Scholar
Pearson RK (2002) Outliers in process modeling and identification. IEEE T Contr Syst T 10(1):55–63
Article Google Scholar
Qiu S, Lin Y, Shang R, Zhang J, Ma L, Zhu Z (2019) Making Landsat time series consistent: evaluating and improving Landsat analysis ready data. Remote Sens 11(1):51
Article Google Scholar
Relvas H, Miranda AI (2018) An urban air quality modeling system to support decision-making: design and implementation. Air Qual Atmos Health 11(7):815–824
Article CAS Google Scholar
Saha B, Srivastava D (2014) Data quality: the other face of big data. International conference on data engineering:1294–1297
Sanchez IE (2017) Optimal threshold estimation for binary classifiers using game theory. F1000Research, 5, ISCB Comm J-2762. https://doi.org/10.12688/f1000research.10114.3
SEPB (Shanghai Environmental Protection Bureau) (2017) Regulations on automatic monitoring construction, networking, Operation and maintenance and management of fixed pollution sources in Shanghai. http://www.sepb.gov.cn/fa/cms/xxgk/AC41/AC4103000/2017/06/96299.htm Accessed 1 November 2018
Solaiman TA, Coulibaly P, Kanaroglou P (2008) Ground-level ozone forecasting using data-driven methods. Air Qual Atmos Health 1(4):179–193
Article CAS Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Machine Learning Res 15(1):1929–1958
Google Scholar
Süt N, Şenocak M (2007) Assessment of the performances of multilayer perceptron neural networks in comparison with recurrent neural networks and two statistical methods for diagnosing coronary artery disease. Expert Syst 24(3):131–142
Article Google Scholar
Tong W, Li L, Zhou X, Hamilton A, Zhang K (2019) Deep learning PM_2.5 concentrations with bidirectional LSTM RNN. Air Qual Atmos Health 12:411–423. https://doi.org/10.1007/s11869-018-0647-4
Article CAS Google Scholar
Van den Broeck J, Cunningham SA, Eeckels R, Herbst K (2005) Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2(10):e267
Article Google Scholar
Wang C, Zhao L, Sun W, Xue J, Xie Y (2018) Identifying redundant monitoring stations in an air quality monitoring network. Atmos Environ 190:256–268
Article CAS Google Scholar
Zhang C, Woodland PC (2015) Parameterised sigmoid and ReLU hidden activation functions for DNN acoustic modelling. Conference of the international speech communication association: 3224–3228
Zhao J, Deng F, Cai Y, Chen J (2019) Long short-term memory – fully connected (LSTM-FC) neural network for PM_2.5 concentration prediction. Chemosphere 220:486–492. https://doi.org/10.1016/j.chemosphere.2018.12.128
Article CAS Google Scholar
Zweig MH, Campbell G (1993) Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39(4):561–577
Article CAS Google Scholar

Download references

Funding

This study was funded by National Natural Science Foundation of China (Project 21,577,090 and 21,777,094), National Science and Technology Support Program (Project 2014BAC22B07) and Shanghai Jiao Tong University China Institute for Urban Governance (Project SJTU-2019UGBD-01).

Author information

Authors and Affiliations

School of Environmental Science and Engineering, Shanghai Jiao Tong University, Dongchuan Road 800, Shanghai, 200230, China
Ziyu Wang, Jingjing Feng, Xiaojia Chen & Jinping Cheng
School of Environment, Jinan University, Xingye Road 855, Guangzhou, 511486, China
Jingjing Feng
Shanghai Environmental Monitoring Center, Shanghai, 200230, China
Qingyan Fu & Song Gao

Authors

Ziyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Feng
View author publications
You can also search for this author in PubMed Google Scholar
Qingyan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Song Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojia Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jinping Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jingjing Feng or Jinping Cheng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(DOCX 728 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Feng, J., Fu, Q. et al. Quality control of online monitoring data of air pollutants using artificial neural networks. Air Qual Atmos Health 12, 1189–1196 (2019). https://doi.org/10.1007/s11869-019-00734-4

Download citation

Received: 25 April 2019
Accepted: 31 July 2019
Published: 07 August 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11869-019-00734-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality control of online monitoring data of air pollutants using artificial neural networks

Abstract

Access this article

Similar content being viewed by others

A new method for prediction of air pollution based on intelligent computation

Statistical data pre-processing and time series incorporation for high-efficacy calibration of low-cost NO2 sensor using machine learning

Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quality control of online monitoring data of air pollutants using artificial neural networks

Abstract

Access this article

Similar content being viewed by others

A new method for prediction of air pollution based on intelligent computation

Statistical data pre-processing and time series incorporation for high-efficacy calibration of low-cost NO2 sensor using machine learning

Air quality assessment and pollution forecasting using artificial neural networks in Metropolitan Lima-Peru

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation