Abstract
Classification tasks usually assume that all possible classes are present during the training phase. This is restrictive if the algorithm is used over a long time and possibly encounters samples from unknown new classes. It is therefore fundamental to develop algorithms able to distinguish between normal and abnormal test data. In the last few years, extreme value theory has become an important tool in multivariate statistics and machine learning. The recently introduced extreme value machine, a classifier motivated by extreme value theory, addresses this problem and achieves competitive performance in specific cases. We show that this algorithm has some theoretical and practical drawbacks and can fail even if the recognition task is fairly simple. To overcome these limitations, we propose two new algorithms for anomaly detection relying on approximations from extreme value theory that are more robust in such cases. We exploit the intuition that test points that are extremely far from the training classes are more likely to be abnormal objects. We derive asymptotic results motivated by univariate extreme value theory that make this intuition precise. We show the effectiveness of our classifiers in simulations and on real data sets.
Article PDF
Similar content being viewed by others
References
Abe, N., Zadrozny, B., Langford, J.: Outlier detection by active learning. In: International Conference on Knowledge Discovery and Data Mining. ACM (2006)
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM 45(6) (1998)
Bendale, A., Boult, T.: Towards open world recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Bishop, C.M.: Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal Processing 141(4) (1994)
Bradley, A.P. : The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7) (1997)
Cai, J. , Einmahl, J., De Haan, L., et al.: Estimation of extreme risk regions under multivariate regular variation. The Annals of Statistics 39(3) (2011)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Computing Surveys (CSUR) 41(3) (2009)
Christopher, M.B.: Pattern Recognition and Machine Learning. Springer, New York (2016)
Coles, S., Bawa, J., Trenner, L., Dorazio, P.: An Introduction to Statistical Modeling of Extreme Values. Springer, Berlin (2001)
De Haan, L., Ferreira, A.: Extreme Value Theory: an Introduction. Springer Science & Business Media, Berlin (2007)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (2009)
Désir, C., Bernard, S., Petitjean, C., Heutte, L.: One class random forests. Pattern Recognition 46(12) (2013)
Dua, D., Graff, C.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2017)
Einmahl, J., Li, J., Liu, R., et al.: Bridging centrality and extremity: refining empirical data depth using extreme value statistics. The Annals of Statistics 43(6) (2015)
Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling Extremal Events: for Insurance and Finance, vol. 33. Springer Science & Business Media, Berlin (2013)
Fragoso, V., Sen, P., Rodriguez, S., Turk, M.: EVSAC: accelerating hypotheses generation by modeling matching scores with extreme value theory. In: IEEE International Conference on Computer Vision (2013)
Frey, P.W., Slate, D.J.: Letter recognition using holland-style adaptive classifiers. Machine Learning 6(2) (1991)
Geng, C., Huang, S., Chen, S.: Recent advances in open set recognition: a survey. Preprint arXiv:1811.08581 (2018)
Goix, N., Sabourin, A., Clemencon, S.: Sparse representation of multivariate extremes with applications to anomaly ranking. In: AISTATS (2016)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2013)
Hall, P.: On estimating the endpoint of a distribution. The Annals of Statistics 10(2) (1982)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
He, Y., Einmahl, J.: Estimation of extreme depth-based quantile regions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 (2017)
Hill, B.M.: A simple general approach to inference about the tail of a distribution. The Annals of Statistics, pp 1163–1174 (1975)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3) (2005)
Jalalzai, H., Clémençon, S., Sabourin, A.: On binary classification in extreme regions. In: Advances in Neural Information Processing Systems (2018)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, Berlin (2013)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Liu, F.T., Ting, K.M., Zhou, Z.: Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1) (2012)
Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: European Conference on Computer Vision. Springer, Berlin (2012)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Pimentel, M.A.F., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99 (2014)
Quinlan, J.R., Compton, P.J., Horn, K.A., Lazarus, L.: Inductive knowledge acquisition: a case study. In: Proceedings of the second Australian Conference on the Applications of Expert Systems (1986)
Rebuffi, S., Kolesnikov, A., Lampert, C.H.: icaRL: incremental classifier and representation learning. In: Conference on Computer Vision and Pattern Recognition (2017)
Roberts, S.J.: Novelty detection using extreme value statistics. IEE Proceedings-Vision, Image and Signal Processing 146(3) (1999)
Rudd, E.M., Jain, L.P., Scheirer, W.J., Boult, T.E. : The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(3) (2018)
Ruping, S.: Incremental learning with support vector machines. In: IEEE International Conference on Data Mining (2001)
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: IEEE International Conference on Computer Vision Workshops (2009)
Scheirer, W.J.: Extreme value theory-based methods for visual recognition. Synthesis Lectures on Computer Vision 7(1) (2017)
Scheirer, W.J., Rocha, A., Micheals, R.J., Boult, T.E.: Meta-recognition: the theory and practice of recognition score analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8) (2011)
Schiffmann, W., Joost, M., Werner, R.: Synthesis and performance analysis of multilayer neural network architectures. Technical report, University of Koblenz (1992)
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. In: Advances in Neural Information Processing Systems (2000)
Shaffer, J.P.: Multiple hypothesis testing. Annual Review of Psychology 46(1) (1995)
Shon, T., Moon, J.: A hybrid machine learning approach to network anomaly detection. Information Sciences 177(18) (2007)
Siffer, A., Fouque, P., Termier, A., Largouet, C.: Anomaly detection in streams with extreme value theory. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017)
Thomas, A., Clemencon, S., Gramfort, A., Sabourin, A.: Anomaly detection in extreme regions via empirical MV-sets on the sphere. In: AISTATS (2017)
Walfish, S.: A review of statistical outlier methods. Pharmaceutical Technology 30(11) (2006)
Weissman, I.: Estimation of parameters and large quantiles based on the k largest observations. J. Amer. Statist. Assoc. 73 (1978)
Acknowledgments
Edoardo Vignotto acknowledges funding from the Swiss National Science Foundation (Doc.Mobility Grant 188229). We gratefully acknowledge helpful comments by two anonymous referees and the editorial board. Sebastian Engelke was supported by the Swiss National Science Foundation; the paper was completed while he was a visitor at the Department of Statistical Sciences, University of Toronto.
Funding
Open access funding provided by University of Geneva.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Vignotto, E., Engelke, S. Extreme value theory for anomaly detection – the GPD classifier. Extremes 23, 501–520 (2020). https://doi.org/10.1007/s10687-020-00393-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10687-020-00393-0