Outlier detection with partial information: application to emergency mapping
This paper, addresses the problem of novelty detection in the case that the observed data is a mixture of a known ‘background’ process contaminated with an unknown other process, which generates the outliers, or novel observations. The framework we describe here is quite general, employing univariate classification with incomplete information, based on knowledge of the distribution (the probability density function, pdf) of the data generated by the ‘background’ process. The relative proportion of this ‘background’ component (the prior ‘background’ probability), the pdf and the prior probabilities of all other components are all assumed unknown. The main contribution is a new classification scheme that identifies the maximum proportion of observed data following the known ‘background’ distribution. The method exploits the Kolmogorov–Smirnov test to estimate the proportions, and afterwards data are Bayes optimally separated. Results, demonstrated with synthetic data, show that this approach can produce more reliable results than a standard novelty detection scheme. The classification algorithm is then applied to the problem of identifying outliers in the SIC2004 data set, in order to detect the radioactive release simulated in the ‘joker’ data set. We propose this method as a reliable means of novelty detection in the emergency situation which can also be used to identify outliers prior to the application of a more general automatic mapping algorithm.
KeywordsMixture Model Prior Probability Expectation Maximization Algorithm Novelty Detection Maximum Proportion
This work was partially supported by the BBSRC contract 92/EGM17737. The SIC2004 data was obtained from [http://www.ai-geostats.org]. This work is partially funded by the European Commission, under the Sixth Framework Programme, by the Contract No. 033811 with DG INFSO, action Line IST-2005-2.5.12 ICT for Environmental Risk Management. The views expressed herein are those of the authors and are not necessarily those of the European Commission.
- EC EUR 21595 EN (2005) Automatic mapping algorithms for routine and emergency monitoring data. In: Dubios G (ed) Report on the spatial interpolation comparison (SIC2004) exerciseGoogle Scholar
- Dubois G, Galmarini S (2005) Introduction to the spatial interpolation comparison (SIC) 2004 exercise and presentation of the datasets. Appl GIS 1(2):1–11Google Scholar
- Bishop CM (1995) Neural networks for pattern recognition. Clarendon, OxfordGoogle Scholar
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B39(1):1–38Google Scholar
- Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in C: the art of scientific computing. Cambridge University Press, LondonGoogle Scholar
- Nabney IT (2001) Netlab: algorithms for pattern recognition. Springer, HeidelbergGoogle Scholar
- Lowe D, Tipping ME (1997) Neuroscale: novel topographic feature extraction using RBF networks. In: Mozer MC, Jordan MI, Petsche T (eds) Advances in neural information processing systems. London, pp 543–549Google Scholar