Advances in Atmospheric Sciences

, Volume 35, Issue 12, pp 1522–1532 | Cite as

Probabilistic Automatic Outlier Detection for Surface Air Quality Measurements from the China National Environmental Monitoring Network

  • Huangjian Wu
  • Xiao Tang
  • Zifa Wang
  • Lin Wu
  • Miaomiao Lu
  • Lianfang Wei
  • Jiang Zhu
Original Paper


Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limitation of measuring methods. Such outliers pose challenges for data-powered applications such as data assimilation, statistical analysis of pollution characteristics and ensemble forecasting. Here, a fully automatic outlier detection method was developed based on the probability of residuals, which are the discrepancies between the observed and the estimated concentration values. The estimation can be conducted using filtering—or regressions when appropriate—to discriminate four types of outliers characterized by temporal and spatial inconsistency, instrument-induced low variances, periodic calibration exceptions, and less PM10 than PM2.5 in concentration observations, respectively. This probabilistic method was applied to detect all four types of outliers in hourly surface measurements of six pollutants (PM2.5, PM10, SO2, NO2, CO and O3) from 1436 stations of the China National Environmental Monitoring Network during 2014–16. Among the measurements, 0.65%–5.68% are marked as outliers, with PM10 and CO more prone to outliers. Our method successfully identifies a trend of decreasing outliers from 2014 to 2016, which corresponds to known improvements in the quality assurance and quality control procedures of the China National Environmental Monitoring Network. The outliers can have a significant impact on the annual mean concentrations of PM2.5, with differences exceeding 10 μg m−3 at 66 sites.

Key words

probabilistic automatic outlier detection air quality observation low pass filter spatial regression bivariate normal distribution 

摘 要

环境空气质量监测网是了解空气质量最直接重要的渠道, 也是众多大气污染研究的关键数据来源. 但由于仪器故障, 恶劣环境和测量方法等原因, 异常数据时有发生, 给大气污染研究和管理带来严峻挑战. 针对这一问题, 本文设计了一种全自动异常数据识别方法. 该方法使用低通滤波, 空间回归等方式拟合监测数据, 通过拟合残差的分布特征计算残差概率, 将小概率的数据识别为异常数据. 依据监测数据的异常特征, 将异常数据分为四类: 时空一致性异常, 小变化异常, 周期性异常, 颗粒物倒挂异常. 针对不同异常特征, 设计相应的拟合估计方法对异常进行有效识别. 本文利用该方法对2014–2016年全国1436个国控站点六项常规污染物(PM2.5, PM10, SO2, NO2, CO 和 O3)监测中的异常数据进行了识别分析. 结果表明, 六项污染物的异常数据占比为 0.65%–5.68%, 其中 PM10 和 CO 的异常数据比例最高, 异常数据在 66 个站点导致的 PM2.5 的年均浓度超过 10 μg m−3. 此外, 也发现2014至2016年间各污染物的异常数据比例呈现逐步下降的趋势, 表明监测技术和管理水平在不断提升.


空气质量监测数据 自动化异常识别 残差概率分布 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors express their sincere gratitude to the CNEMC for providing the air quality observations for the period 2014–16. This study was supported by the National Natural Science Foundation (Grant Nos. 91644216 and 41575128), the CAS Information Technology Program (Grant No. XXH13506-302) and Guangdong Provincial Science and Technology Development Special Fund (No. 2017B020216007).


  1. Aggarwal, C. C., 2016: Outlier Analysis. 2nd ed., Springer, Cham, 263 pp.Google Scholar
  2. Araki, S., H. Shimadera, K. Yamamoto, and A. Kondo, 2017: Effect of spatial outliers on the regression modelling of air pollutant concentrations: A case study in Japan. Atmos. Environ., 153, 83–93, Scholar
  3. Bickel, P. J., and E. Levina, 2008: Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199–227, Scholar
  4. Bobbia, M., M. Misiti, Y. Misiti, J.-M. Poggi, and B. Portier, 2015: Spatial outlier detection in the PM10 monitoring network of Normandy (France). Atmospheric Pollution Research, 6, 476–483, Scholar
  5. Campulová, M., P. Veselík, and J. Michálek, 2017: Control chart and Six sigma based algorithms for identification of outliers in experimental data, with an application to particulate matter PM10. Atmospheric Pollution Research, 8, 700–708, Scholar
  6. Dorigo, W. A., and Coauthors, 2013: Global automated quality control of in situ soil moisture data from the international soil moisture network. Vadose Zone Journal, 12, Scholar
  7. Dunn, R. J. H., K. M. Willett, P. W. Thorne, E. V. Woolley, I. Durre, A. Dai, D. E. Parker, and R. S. Vose, 2012: HadISD: A quality-controlled global synoptic report database for selected variables at long-term stations from 1973–2011. Climate of the Past, 8, 1649–1679, Scholar
  8. Durre, I., M. J. Menne, B. E. Gleason, T. G. Houston, and R. S. Vose, 2010: Comprehensive automated quality assurance of daily surface observations. Journal of Applied Meteorology and Climatology, 49, 1615–1633, Scholar
  9. Feng, S., Q. Hu, and W. H. Qian, 2004: Quality control of daily meteorological data in China, 1951–2000. A new dataset. International Journal of Climatology, 24, 853–870, Scholar
  10. Fiebrich, C. A., C. R. Morgan, A. G. McCombs, P. K. Hall, and R. A. McPherson, 2010: Quality assurance procedures for mesoscale meteorological data. J. Atmos. Oceanic Technol., 27, 1565–1582, Scholar
  11. Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125, 723–757, Scholar
  12. Golz, C., T. Einfalt, M. Gabella, and U. Germann, 2005: Quality control algorithms for rainfall measurements. Atmospheric Research, 77, 247–255, Scholar
  13. Gu, J. B., and Coauthors, 2017: Ground-level NO2 concentrations over China inferred from the satellite OMI and CMAQ model simulations. Remote Sensing, 9, 519, Scholar
  14. Guan, Q. Y., 2016: Judgment and handling of abnormal data during ambient air automatic monitoring data audit. Environmental Monitoring and Forewarning, 8, 59–63, (in Chinese).Google Scholar
  15. Ingleby, B., and M. Huddleston, 2007: Quality control of ocean temperature and salinity profiles—Historical and real-time data. J. Mar. Syst., 65, 158–175, Scholar
  16. Jiménez, P. A., J. F. González-Rouco, J. Navarro, J. P. Montávez, and E. Garcia-Bustamante, 2010: Quality assurance of surface wind observations from automated weather stations. J. Atmos. Oceanic Technol., 27, 1101–1122, Scholar
  17. Karam, L. J., and J. H. McClellan, 1995: Complex Chebyshev approximation for FIR filter design. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 42, 207–216, Scholar
  18. Kracht, O., M. Gerboles, and H. I. Reuter, 2014: First evaluation of a novel screening tool for outlier detection in large scale ambient air quality datasets. International Journal of Environment and Pollution, 55, 120–128, Scholar
  19. Lanzante, J. R., 1996: Resistant, robust and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. International Journal of Climatology, 16, 1197–1226,<1197::AIDJOC89>3.0.CO;2-L.CrossRefGoogle Scholar
  20. Legates, D. R., and G. J. McCabe, 1999: Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour. Res., 35, 233–241, Scholar
  21. Leiva, V., M. Barros, G. A. Paula, and A. Sanhueza, 2008: Generalized Birnbaum-Saunders distributions applied to air pollutant concentration. Environmetrics, 19, 235–249, Scholar
  22. Li, H. M., and Coauthors, 2017: Chemical partitioning of fine particle-bound metals on haze–fog and non-haze–fog days in Nanjing, China and its contribution to human health risks. Atmospheric Research, 183, 142–150, Scholar
  23. Liao, J., B. Wang, and Q. X. Li, 2014: A new method for quality control of Chinese rawinsonde wind observations. Adv Atmos Sci, 31, 1293–1304, Scholar
  24. Luo, M., 2016: Quality control research of air pollutant hourly monitoring data. M.S thesis Dept. of School of Geographic Sciences, East China Normal University (in Chinese).Google Scholar
  25. Niu, G., 2017: Features and cause analysis of abnormal data of particulate matter in ambient air monitoring. Anhui Chemical Industry, 43, 103–105, (in Chinese).Google Scholar
  26. Pan, B.F., H. H. Zheng, L. N. Li, and W. Wang, 2014: The characteristic and reason about the reversal between PM2.5 and PM10 in ambient air quality automatic monitoring. Environmental Monitoring in China, 30, 90–95 (in Chinese).Google Scholar
  27. Sciuto, G., B. Bonaccorso, A. Cancelliere, and G. Rossi, 2013: Probabilistic quality control of daily temperature data. International Journal of Climatology, 33, 1211–1227, Scholar
  28. Shan, W. P., Y. Q. Yin, H. X. Lu, and S. X. Liang, 2009: A meteorological analysis of ozone episodes using HYSPLIT model and surface data. Atmospheric Research, 93, 767–776, Scholar
  29. Steinacker, R., D. Mayer, and A. Steiner, 2011: Data quality control based on self-consistency. Mon. Wea. Rev., 139, 3974–3991, Scholar
  30. Tang, X., J. Zhu, Z. F. Wang, A. Gbaguidi, C. Y. Lin, J. Y. Xin, T. Song, and B. Hu, 2016: Limitations of ozone data assimilation with adjustment of NOX emissions: Mixed effects on NO2 forecasts over Beijing and surrounding areas. Atmospheric Chemistry and Physics, 16, 6395–6405, Scholar
  31. Wang, L. T., Y. Zhang, K. Wang, B. Zheng, Q. Zhang, and W. Wei, 2016: Application of Weather Research and Forecasting Model with Chemistry (WRF/Chem) over northern China: Sensitivity study, comparative evaluation, and policy implications. Atmos. Environ., 124, 337–350, Scholar
  32. Wu, L., M. Bocquet, and M. Chevallier, 2010: Optimal reduction of the ozone monitoring network over France. Atmos. Environ., 44, 3071–3083, Scholar
  33. You, J. S., K. G. Hubbard, and S. Goddard, 2008: Comparison of methods for spatially estimating station temperatures in a quality control system. International Journal of Climatology, 28, 777–787, Scholar
  34. Zheng, B., and Coauthors, 2015: Heterogeneous chemistry: A mechanism missing in current models to explain secondary inorganic aerosol formation during the January 2013 haze episode in North China. Atmospheric Chemistry and Physics, 15, 2031–2049, Scholar

Copyright information

© Chinese National Committee for International Association of Meteorology and Atmospheric Sciences, Institute of Atmospheric Physics, Science Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Huangjian Wu
    • 1
    • 3
  • Xiao Tang
    • 1
  • Zifa Wang
    • 1
    • 3
  • Lin Wu
    • 1
  • Miaomiao Lu
    • 1
  • Lianfang Wei
    • 1
  • Jiang Zhu
    • 2
  1. 1.State Key Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry, Institute of Atmospheric PhysicsChinese Academy of SciencesBeijingChina
  2. 2.International Center for Climate and Environment Sciences, Institute of Atmospheric PhysicsChinese Academy of SciencesBeijingChina
  3. 3.University of Chinese Academy of ScienceBeijingChina

Personalised recommendations