Abstract
The increasing automation in many areas of the Industry expressly demands to design efficient machine learning solutions for the detection of abnormal events. With the ubiquitous deployment of sensors monitoring nearly continuously the health of complex infrastructures, anomaly detection can now rely on measurements sampled at a very high frequency, providing a very rich representation of the phenomenon under surveillance. In order to exploit fully the information thus collected, the observations cannot be treated as multivariate data anymore and a functional analysis approach is required. It is the purpose of this paper to investigate the performance of recent techniques for anomaly detection in the functional setup on real datasets. After an overview of the state of the art and a visual-descriptive study, a variety of anomaly detection methods are compared. While taxonomies of abnormalities (e.g., shape, location) in the functional setup are documented in the literature, assigning a specific type to the identified anomalies appears to be a challenging task. Thus, strengths and weaknesses of the existing approaches are benchmarked in view of these highlighted types in a simulation study. Anomaly detection methods are next evaluated on two datasets, related to the monitoring of helicopters in flight and to the spectrometry of construction materials namely. The benchmark analysis is concluded by a recommendation guidance for practitioners.
Similar content being viewed by others
Data Availability
The datasets are available on the following link https://drive.google.com/drive/folders/1p1k5eRwSPDH_BP6E8j_iLMCaUtEfLOkN?usp=sharing.
Code Availability
The code is available on the following link https://drive.google.com/drive/folders/1p1k5eRwSPDH_BP6E8j_iLMCaUtEfLOkN?usp=sharing.
References
Hawkins, D.M.: Identification of Outliers. Monographs on Applied Probability and Statistics. Chapman and Hall, London (1980)
Rousseeuw, P.J., Hubert, M.: Anomaly detection by robust statistics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(2), e1236 (2018)
Staerman, G., Mozharovskyi, P., Clémençon, S., d’Alché Buc, F.: Functional isolation forest. In: Proceedings of The 11th Asian Conference on Machine Learning, pp. 332–347 (2019)
Wang, J.-L., Chiou, J.-M., Müller, H.-G.: Functional data analysis. Annu. Rev. Stat. Appl. 3, 257–295 (2016)
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, New York (2005)
Ferraty, F., Vieu, P.: Nonparametric Functional Data Analysis: Theory and Practice. Springer, Berlin (2006)
Ramsay, J.O., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, Berlin (2002)
Hubert, M., Rousseeuw, P.J., Segaert, P.: Multivariate functional outlier detection. Stat. Methods Appl. 24(2), 177–202 (2015)
Cuevas, A., Febrero, M., Fraiman, R.: Robust estimation and classification for functional data via projection-based depth notions. Comput. Stat. 22(3), 481–496 (2007)
Staerman, G., Mozharovskyi, P., Clémençon, S.: The area of the convex hull of sampled curves: a robust functional statistical depth measure. In: Proceedings of the 23nd International Conference on Artificial Intelligence and Statistics (AISTATS 2020), vol. 108, pp. 570–579 (2020)
Tukey, J.W.: Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians. Vancouver, 1975, vol. 2, pp. 523–531 (1975)
Donoho, D.L., Gasko, M., et al.: Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Stat. 20(4), 1803–1827 (1992)
Becker, C., Fried, R., Kuhnt, S.: Festschrift in Honour of Ursula Gather. Springer, Berlin (2014)
Nagy, S., Gijbels, I., Hlubinka, D.: Depth-based recognition of shape outlying functions. J. Comput. Graph. Stat. 26(4), 883–893 (2017)
Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Polonik, W.: Minimum volume sets and generalized quantile processes. Stoch. Process. Appl. 69(1), 1–24 (1997)
Scott, C., Nowak, R.: Learning minimum volume sets. J. Mach. Learn. Res. 7, 665–704 (2006)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 93–104. ACM (2000)
Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. In: Proceedings of the Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008)
Hariri, S., Kind, M.C., Brunner, R.J.: Extended isolation forest. IEEE Trans. Knowl. Data Eng. 33, 1479–1489 (2019)
Zuo, Y., Serfling, R.: General notions of statistical depth function. Ann. Stat. 28(2), 461–482 (2000). (04)
Staerman, G.: Functional anomaly detection and robust estimation. PhD thesis, Institut polytechnique de Paris (2022)
Mosler, K.: Depth statistics. In: Becker, C., Fried, R., Kuhnt, S. (eds.) Robustness and Complex Data Structures: Festschrift in Honour of Ursula Gather, pp. 17–34. Springer, Berlin (2013)
Kuelbs, J., Zinn, J.: Half-region depth for stochastic processes. J. Multivar. Anal. 142, 86–105 (2015)
Nieto-Reyes, A., Battey, H.: A topologically valid definition of depth for functional data. Stat Sci 31, 61–79 (2016)
Gijbels, I., Nagy, S., et al.: On a general definition of depth for functional data. Stat. Sci. 32(4), 630–639 (2017)
Mosler, K., Polyakova, Y.: General notions of depth for functional data (2018). arXiv:1208.1981
Claeskens, G., Hubert, M., Slaets, L., Vakili, K.: Multivariate functional halfspace depth. J. Am. Stat. Assoc. 109(505), 411–423 (2014)
Fraiman, R., Muniz, G.: Trimmed means for functional data. TEST 10(2), 419–440 (2001)
Staerman, G., Mozharovskyi, P., Clémençon, S., d’Alché Buc, F.: A pseudo-metric between probability distributions based on depth-trimmed regions (2021). arXiv:2103.12711
Staerman, G., Mozharovskyi, P., Clémençon, S.: Affine-invariant integrated rank-weighted depth: definition, properties and finite sample analysis (2021). arXiv:2106.11068
Brys, G., Hubert, M., Struyf, A.: A robust measure of skewness. J. Comput. Graph. Stat. 13(4), 996–1017 (2004)
Chen, J., Sathe, S., Aggarwal, C., Turaga, D.: Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90–98. SIAM (2017)
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674 (2017)
Ngo, P.C., Winarto, A.A., Kou, C.K.L., Park, S., Akram, F., Lee, H.K.: Fence gan: towards better anomaly detection. In: 2019 IEEE 31St International Conference on Tools with Artificial Intelligence (ICTAI), pp. 141–148. IEEE (2019)
Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G., Schmidt-Erfurth, U.: f-anogan: fast unsupervised anomaly detection with generative adversarial networks. Med. Image Anal. 54, 30–44 (2019)
Pang, G., Shen, C., Cao, L., Van Den Hengel, A.: Deep learning for anomaly detection: a review. ACM Comput. Surv.: CSUR 54(2), 1–38 (2021)
Pang, G., Cao, L., Chen, L., Liu, H.: Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2041–2050 (2018)
Zong, B., Song, Q., Min, M.R., Cheng, W., Lumezanu, C., Cho, D., Chen, H.: Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: International Conference on Learning Representations (2018)
Wang, H., Pang, G., Shen, C., Ma, C. Unsupervised representation learning by predicting random distances (2019). arXiv:1912.12186
Zhang, C., Song, D., Chen, Y., Feng, X., Lumezanu, C., Cheng, W., Ni, J., Zong, B., Chen, H., Chawla, N.V.: A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1409–1416 (2019)
Ma, R., Pang, G., Chen, L., van den Hengel, A.: Deep graph-level anomaly detection by glocal knowledge distillation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pp. 704–714 (2022)
Fawcett, T.: An introduction to ROC analysis. Lett. Pattern Recogn. 27(8), 861–874 (2006)
Clémençon, S., Vayatis, N.: Nonparametric estimation of the precision-recall curve. In: ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 185–192 (2009)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv.: CSUR 41(3), 1–58 (2009)
Segaert, P., Hubert, M., Rousseeuw, P., Raymaekers, J.: mrfdepth: depth measures in multivariate, regression and functional settings. R package version 1.0.11 (2019)
Tarabelloni, N., Arribas-Gil, A., Ieva, F., Paganoni, A.M., Romo, J.: Roahd: robust analysis of high dimensional data. R package version 1.4.1 (2018)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Hyndman, R.J., Shang, H.L.: Rainbow plots, bagplots, and boxplots for functional data. J. Comput. Graph. Stat. 19(1), 29–45 (2010)
Sun, Y., Genton, M.G.: Functional boxplots. J. Comput. Graph. Stat. 20(2), 316–334 (2011)
Xie, W., Kurtek, S., Bharath, K., Sun, Y.: A geometric approach to visualization of variability in functional data. J. Am. Stat. Assoc. 112(519), 979–993 (2017)
Arribas-Gil, A., Romo, J.: Shape outlier detection and visualization for functional data: the outliergram. Biostatistics 15(4), 603–619 (2014)
Rousseeuw, P.J., Raymaekers, J., Hubert, M.: A measure of directional outlyingness with applications to image data and video. J. Comput. Graph. Stat. 27(2), 345–359 (2018)
Dai, W., Genton, M.: Multivariate functional data visualization and outlier detection. J. Comput. Graph. Stat. 27, 923–934 (2017)
Funding
This work has been funded by BPI France in the context of the PSPC Project Expresso (2017–2021). This project also received financial support from the initiative “Forschungspartnerschaften Mineralrohstoffe - ein strategischer Forschungsschwerpunkt der Geologischen Bundesanstalt”. The spectroscopic data of sedimentary material were provided by the Geological Survey of Austria.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Benchmarked datasets
In this part, we display in Fig. 8 the aeronautics and the rocks datasets.
Appendix B: Additional experiments on simulated anomalies
In this part, complementary experiments to the Sect. 3 are displayed. They are conducted with the same methodology but varying proportion of anomalies: 1% in Table 5, 2% in Table 6, 3% in Table 7 and 4% in Table 8.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Staerman, G., Adjakossa, E., Mozharovskyi, P. et al. Functional anomaly detection: a benchmark study. Int J Data Sci Anal 16, 101–117 (2023). https://doi.org/10.1007/s41060-022-00366-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-022-00366-5