Abstract
Anomaly detection focuses on identifying examples in the data that somehow deviate from what is expected or typical. Algorithms for this task usually assign a score to each example that represents how anomalous the example is. Then, a threshold on the scores turns them into concrete predictions. However, each algorithm uses a different approach to assign the scores, which makes them difficult to interpret and can quickly erode a user’s trust in the predictions. This paper introduces an approach for assessing the reliability of any anomaly detector’s example-wise predictions. To do so, we propose a Bayesian approach for converting anomaly scores to probability estimates. This enables the anomaly detector to assign a confidence score to each prediction which captures its uncertainty in that prediction. We theoretically analyze the convergence behaviour of our confidence estimate. Empirically, we demonstrate the effectiveness of the framework in quantifying a detector’s confidence in its predictions on a large benchmark of datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We assume that \(k \in \mathbb {N}\), taking the floor function when needed.
- 2.
A failure would correspond to an training example having a higher anomaly score than the chosen threshold. Given the assumption that all training examples are normal, this would indicate a false positive.
- 3.
Implementation available at: https://github.com/Lorenzo-Perini/Confidence_AD.
References
Campos, G.O., et al.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Disc. 30(4), 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 1–58 (2009)
Demšar, J.: Statistical comparisons of classifiers over multiple datasets. J. Mach. Learn. Res. 7, 1–30 (2006)
Gao, J., Tan, P.N.: Converting output scores from outlier detection algorithms into probability estimates. In: Proceedings of Sixth IEEE International Conference on Data Mining, pp. 212–221. IEEE (2006)
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1321–1330 (2017)
Kriegel, H.P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 13–24. SIAM (2011)
Kull, M., Nieto, M.P., Kängsepp, M., Filho, T.S., Song, H., Flach, P.: Beyond temperature scaling: obtaining well-calibrated multi-class probabilities with Dirichlet calibration. In: Advances in Neural Information Processing Systems (2019)
Kull, M., Silva Filho, T.M., Flach, P., et al.: Beyond sigmoids: How to obtain well-calibrated probabilities from binary classifiers with beta calibration. Electron. J. Stat. 11(2), 5052–5080 (2017)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceeding of 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using Bayesian binning. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Perello-Nieto, M., De Menezes Filho, E.S.T., Kull, M., Flach, P.: Background check: a general technique to build more reliable and versatile classifiers. In: Proceedings of 16th IEEE International Conference on Data Mining. IEEE (2016)
Perini, L., Vercruyssen, V., Davis, J.: Class prior estimation in active positive and unlabeled learning. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI) (2020)
Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. large Margin Classifiers 10, 61–74 (1999)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large datasets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 427–438 (2000)
Robberechts, P., Bosteels, M., Davis, J., Meert, W.: Query log analysis: detecting anomalies in DNS traffic at a TLD resolver. In: Monreale, A., et al. (eds.) ECML PKDD 2018. CCIS, vol. 967, pp. 55–67. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14880-5_5
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Vaicenavicius, J., Widmann, D., Andersson, C., Lindsten, F., Roll, J., Schön, T.B.: Evaluating model calibration in classification. arXiv:1902.06977 (2019)
Vercruyssen, V., Wannes, M., Gust, V., Koen, M., Ruben, B., Jesse, D.: Semi-supervised anomaly detection with an application to water analytics. In: Proceedings of 18th IEEE International Conference on Data Mining, pp. 527–536. IEEE (2018)
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and Naive Bayesian classifiers. In: Proceedings of ICML, pp. 609–616 (2001)
Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699 (2002)
Acknowledgements
This work is supported by the Flemish government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme (JD, LP, VV), FWO (G0D8819N to JD), and KU Leuven Research Fund (C14/17/07 to JD).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Perini, L., Vercruyssen, V., Davis, J. (2021). Quantifying the Confidence of Anomaly Detectors in Their Example-Wise Predictions. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12459. Springer, Cham. https://doi.org/10.1007/978-3-030-67664-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-67664-3_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67663-6
Online ISBN: 978-3-030-67664-3
eBook Packages: Computer ScienceComputer Science (R0)