Probabilistic Automatic Outlier Detection for Surface Air Quality Measurements from the China National Environmental Monitoring Network
Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limitation of measuring methods. Such outliers pose challenges for data-powered applications such as data assimilation, statistical analysis of pollution characteristics and ensemble forecasting. Here, a fully automatic outlier detection method was developed based on the probability of residuals, which are the discrepancies between the observed and the estimated concentration values. The estimation can be conducted using filtering—or regressions when appropriate—to discriminate four types of outliers characterized by temporal and spatial inconsistency, instrument-induced low variances, periodic calibration exceptions, and less PM10 than PM2.5 in concentration observations, respectively. This probabilistic method was applied to detect all four types of outliers in hourly surface measurements of six pollutants (PM2.5, PM10, SO2, NO2, CO and O3) from 1436 stations of the China National Environmental Monitoring Network during 2014–16. Among the measurements, 0.65%–5.68% are marked as outliers, with PM10 and CO more prone to outliers. Our method successfully identifies a trend of decreasing outliers from 2014 to 2016, which corresponds to known improvements in the quality assurance and quality control procedures of the China National Environmental Monitoring Network. The outliers can have a significant impact on the annual mean concentrations of PM2.5, with differences exceeding 10 μg m−3 at 66 sites.
Key wordsprobabilistic automatic outlier detection air quality observation low pass filter spatial regression bivariate normal distribution
环境空气质量监测网是了解空气质量最直接重要的渠道, 也是众多大气污染研究的关键数据来源. 但由于仪器故障, 恶劣环境和测量方法等原因, 异常数据时有发生, 给大气污染研究和管理带来严峻挑战. 针对这一问题, 本文设计了一种全自动异常数据识别方法. 该方法使用低通滤波, 空间回归等方式拟合监测数据, 通过拟合残差的分布特征计算残差概率, 将小概率的数据识别为异常数据. 依据监测数据的异常特征, 将异常数据分为四类: 时空一致性异常, 小变化异常, 周期性异常, 颗粒物倒挂异常. 针对不同异常特征, 设计相应的拟合估计方法对异常进行有效识别. 本文利用该方法对2014–2016年全国1436个国控站点六项常规污染物(PM2.5, PM10, SO2, NO2, CO 和 O3)监测中的异常数据进行了识别分析. 结果表明, 六项污染物的异常数据占比为 0.65%–5.68%, 其中 PM10 和 CO 的异常数据比例最高, 异常数据在 66 个站点导致的 PM2.5 的年均浓度超过 10 μg m−3. 此外, 也发现2014至2016年间各污染物的异常数据比例呈现逐步下降的趋势, 表明监测技术和管理水平在不断提升.
关键词空气质量监测数据 自动化异常识别 残差概率分布
Unable to display preview. Download preview PDF.
The authors express their sincere gratitude to the CNEMC for providing the air quality observations for the period 2014–16. This study was supported by the National Natural Science Foundation (Grant Nos. 91644216 and 41575128), the CAS Information Technology Program (Grant No. XXH13506-302) and Guangdong Provincial Science and Technology Development Special Fund (No. 2017B020216007).
- Aggarwal, C. C., 2016: Outlier Analysis. 2nd ed., Springer, Cham, 263 pp.Google Scholar
- Campulová, M., P. Veselík, and J. Michálek, 2017: Control chart and Six sigma based algorithms for identification of outliers in experimental data, with an application to particulate matter PM10. Atmospheric Pollution Research, 8, 700–708, https://doi.org/10.1016/j.apr.2017.01.004.CrossRefGoogle Scholar
- Dunn, R. J. H., K. M. Willett, P. W. Thorne, E. V. Woolley, I. Durre, A. Dai, D. E. Parker, and R. S. Vose, 2012: HadISD: A quality-controlled global synoptic report database for selected variables at long-term stations from 1973–2011. Climate of the Past, 8, 1649–1679, https://doi.org/10.5194/cp-8-1649-2012.CrossRefGoogle Scholar
- Guan, Q. Y., 2016: Judgment and handling of abnormal data during ambient air automatic monitoring data audit. Environmental Monitoring and Forewarning, 8, 59–63, https://doi.org/10.3969/j.issn.1674-6732.2016.05.015 (in Chinese).Google Scholar
- Lanzante, J. R., 1996: Resistant, robust and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. International Journal of Climatology, 16, 1197–1226, https://doi.org/10.1002/(SICI)1097-0088(199611)16:11<1197::AIDJOC89>3.0.CO;2-L.CrossRefGoogle Scholar
- Luo, M., 2016: Quality control research of air pollutant hourly monitoring data. M.S thesis Dept. of School of Geographic Sciences, East China Normal University (in Chinese).Google Scholar
- Niu, G., 2017: Features and cause analysis of abnormal data of particulate matter in ambient air monitoring. Anhui Chemical Industry, 43, 103–105, https://doi.org/10.3969/j.issn.1008-553X.2017.02.033 (in Chinese).Google Scholar
- Pan, B.F., H. H. Zheng, L. N. Li, and W. Wang, 2014: The characteristic and reason about the reversal between PM2.5 and PM10 in ambient air quality automatic monitoring. Environmental Monitoring in China, 30, 90–95 (in Chinese).Google Scholar
- Tang, X., J. Zhu, Z. F. Wang, A. Gbaguidi, C. Y. Lin, J. Y. Xin, T. Song, and B. Hu, 2016: Limitations of ozone data assimilation with adjustment of NOX emissions: Mixed effects on NO2 forecasts over Beijing and surrounding areas. Atmospheric Chemistry and Physics, 16, 6395–6405, https://doi.org/10.5194/acp-16-6395-2016.CrossRefGoogle Scholar
- Wang, L. T., Y. Zhang, K. Wang, B. Zheng, Q. Zhang, and W. Wei, 2016: Application of Weather Research and Forecasting Model with Chemistry (WRF/Chem) over northern China: Sensitivity study, comparative evaluation, and policy implications. Atmos. Environ., 124, 337–350, https://doi.org/10.1016/j.atmosenv.2014.12.052.CrossRefGoogle Scholar
- Zheng, B., and Coauthors, 2015: Heterogeneous chemistry: A mechanism missing in current models to explain secondary inorganic aerosol formation during the January 2013 haze episode in North China. Atmospheric Chemistry and Physics, 15, 2031–2049, https://doi.org/10.5194/acp-15-2031-2015.CrossRefGoogle Scholar