A meta-level analysis of online anomaly detectors

Ntroumpogiannis, Antonios; Giannoulis, Michail; Myrtakis, Nikolaos; Christophides, Vassilis; Simon, Eric; Tsamardinos, Ioannis

doi:10.1007/s00778-022-00773-x

A meta-level analysis of online anomaly detectors

Regular Paper
Published: 14 January 2023

Volume 32, pages 845–886, (2023)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Antonios Ntroumpogiannis¹,
Michail Giannoulis²,
Nikolaos Myrtakis^1,3,
Vassilis Christophides³,
Eric Simon⁴ &
…
Ioannis Tsamardinos¹

485 Accesses
4 Citations
Explore all metrics

Abstract

Real-time detection of anomalies in streaming data is receiving increasing attention as it allows us to raise alerts, predict faults, and detect intrusions or threats across industries. Yet, little attention has been given to compare the effectiveness and efficiency of anomaly detectors for streaming data (i.e., of online algorithms). In this paper, we present a qualitative, synthetic overview of major online detectors from different algorithmic families (i.e., distance, density, tree or projection based) and highlight their main ideas for constructing, updating and testing detection models. Then, we provide a thorough analysis of the results of a quantitative experimental evaluation of online detection algorithms along with their offline counterparts. The behavior of the detectors is correlated with the characteristics of different datasets (i.e., meta-features), thereby providing a meta-level analysis of their performance. Our study addresses several missing insights from the literature such as (a) how reliable are detectors against a random classifier and what dataset characteristics make them perform randomly; (b) to what extent online detectors approximate the performance of offline counterparts; (c) which sketch strategy and update primitives of detectors are best to detect anomalies visible only within a feature subspace of a dataset; (d) what are the trade-offs between the effectiveness and the efficiency of detectors belonging to different algorithmic families; (e) which specific characteristics of datasets yield an online algorithm to outperform all others.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big Data Analytics: A Literature Review Paper

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

Stratified random sampling from streaming and stored data

Article 23 October 2020

Trong Duc Nguyen, Ming-Hung Shih, … Bojian Xu

Notes

In this paper, we use the terms outlier, novelty and anomaly detection interchangeably
We call “sample” an element (i.e., observation, measurement) of a data stream.
An hyper-parameter cannot be estimated from the data.
The recently proposed distance-based detector NETS [73] consumes less resources than MCOD but is not reporting any improvement in terms of effectiveness.
nearest neighbors are distinguished according to maximum and average distance.
$2^{h+1} - 1$ is the number of nodes in a perfect binary tree.
https://aws.amazon.com/kinesis/
$2 n - 1$ is the number of nodes of a full binary tree with n leaves.
We consider the coordinates of each point as (F2, F1).
in our implementation we consider the simple forgetting mechanism rather than the time-decaying mechanism.
Authors report that the value $\gamma =0.01$ is optimal for the most datasets.
https://github.com/Waikato/moa
https://infolab.usc.edu/Luan/Outlier/CountBasedWindow/DODDS/src/outlierdetection/
https://github.com/kaist-dmlab/STARE
https://github.com/tranvanluan2/cpod
https://github.com/cmuxstream/cmuxstream-core
Online version: http://agents.fel.cvut.cz/stegodata/tools/ Offline Version: https://github.com/yzhao062/pyod
https://github.com/bedanta01/Subspace-Outlier-Detection
https://scikit-learn.org/
https://github.com/ngoix/OCRF
After removing null values and categorical features not treated by our anomaly detectors.
https://www.ipd.kit.edu/~muellere/HiCS/
There are also some cases of unknown anomalies they may appear in any of those categories.
k value is automatically computed during training.
When p is stored in a leaf of size larger than one, the value of !t(p) is adjusted to c(size).
In most experimental studies [15, 24, 25], detectors were executed using the default values of their hyper-parameters as “recommended by their authors.”
RS does not require to compute the gradient of the problem to be optimized and hence be used on functions that are not continuous or differentiable [31].
https://en.wikipedia.org/wiki/Coefficient_of_variation

References

Aggarwal, C.: An Introduction to Outlier Analysis, pp. 1–40 (2013)
Aggarwal, C., Hinneburg, A., Keim, A.: On the surprising behavior of distance metrics in high dimensional spaces ICDT (2001)
Aggarwal, C., Sathe, S.: Theoretical foundations and algorithms for outlier ensembles. SIGKDD Explor. 17(1), 74 (2015)
Article Google Scholar
Aggarwal, C., Sathe, S.: Outlier Ensembles-An Introduction. Springer, Berlin (2017)
Book Google Scholar
Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12), 68 (2015)
Google Scholar
Alcobaça, E., Siqueira, F., Rivolli, A., Garcia, L., Oliva, J., de Carvalho, A.: Mfe: towards reproducible meta-feature extraction. JMLR 21(111), 1–5 (2020)
Google Scholar
Bailis, P., Gan, E., Madden, S., Narayanan, D., Rong, K., Suri, S.: Macrobase: prioritizing attention in fast data. In: SIGMOD (2017)
Ben-Haim, Y., Tom-Tov, E.: A streaming parallel decision tree algorithm. JMLR 11, 994 (2010)
MathSciNet MATH Google Scholar
Bergmeir, C., Benítez, M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 7, 191 (2012)
Google Scholar
Birge, L., Rozenholc, Y.: How many bins should be put in a regular histogram. In: ESAIM: Probability and Statistics, pp. 24–45 (2006)
Blázquez-García, A., Conde, A., Mori, U., Lozano, J.: A review on outlier/anomaly detection in time series data. ACM Comput. Surv. 54(3), 98 (2021)
Google Scholar
Braei, M., Wagner, S.: Anomaly detection in univariate time-series: a survey on the state-of-the-art. CoRR 00433, 2020 (2004)
Google Scholar
Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modelling under imbalanced distributions. CoRR, 1505.01658 (2015)
Breunig, M., Kriegel, H., Ng, R., Sander, J.: Lof: identifying density-based local outliers. SIGMOD Rec. 29(2), 799 (2000)
Article Google Scholar
Campos, G., Zimek, A., Sander, J., Campello, R., Micenková, B., Schubert, E., Assent, I., Houle, M.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016)
Article MathSciNet Google Scholar
Cao, L., Yang, D., Wang, Q., Yu, Y., Wang, J., Rundensteiner, E.: Scalable distance-based outlier detection over high-volume data streams (2014)
Carbone, P., Fragkoulis, M., Kalavri, V., Katsifodimos, A.: Beyond analytics: the evolution of stream processing systems. In: SIGMOD (2020)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 96 (2009)
Article Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE TKDE 24(5), 119 (2012)
Google Scholar
Choudhary, D., Arun Kejariwal, A., Orsini, F.: On the runtime-efficacy trade-off of anomaly detection techniques for real-time streaming data. CoRR, arxiv:1710.04735 (2017)
Cook, A., Mısırlı, G., Fan, Z.: Anomaly detection for IoT time-series data: a survey. IEEE IoT J. 7(7), 88 (2020)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: (ICML’06), pp. 233–240 (2006)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. In: JMLR, 7, December (2006)
Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J.: A comparative evaluation of outlier detection algorithms. Pattern Recogn. 74(C), 406–421 (2018)
Article MATH Google Scholar
Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J.: A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognit. 74, 478 (2018)
Article MATH Google Scholar
Dua, D., Graff, C.: Uci Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2017)
Google Scholar
Dudani, S.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. SMC 6(4), 325–327 (1976)
Google Scholar
Emmott, A., Das, S., Dietterich, T., Fern, A., Wong, W.: A meta-analysis of the anomaly detection problem. CoRR, arxiv:1503.01158 (2015)
Goix, N., Drougard, N., Brault, R., Chiapino, M.: One class splitting criteria for random forests. In: ACML (2017)
Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS One 11(4), 887 (2016)
Article Google Scholar
Granichin, O.N., Volkovich, Z., Toledano-Kitai, D.: Randomized Algorithms in Automatic Control and Data Mining, vol. 67. Springer, Berlin (2015)
Google Scholar
Guha, S., Mishra, N., Roy, G., Schrijvers, O.: Robust random cut forest based anomaly detection on streams. In: ICML’16, pp. 2712–2721 (2016)
Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE TKDE 26(9), 83 (2014)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009)
Book MATH Google Scholar
Herbold, S.: Autorank: a python package for automated ranking of classifiers. J. Open Source Softw. 3, 2173 (2020)
Article Google Scholar
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 69 (2004)
Article MATH Google Scholar
Jacob, V., Song, F., Stiegler, A., Rad, B., Diao, Y., Tatbul, N.: Exathlon: a benchmark for explainable anomaly detection over time series. PVLDB 14(11), 58 (2021)
Google Scholar
Keller, F., Müller, E., Böhm, K.: (2012) Hics: high contrast subspaces for density-based outlier ranking. In: ICDE, pp. 1037–1048
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)
Kontaki, M., Gounaris, A., Papadopoulos, A., Tsichlas, K., Manolopoulos, Y.: Continuous monitoring of distance-based outliers over data streams. In: ICDE (2011)
Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.: Semantics and evaluation techniques for window aggregates in data streams. In: SIGMOD (2005)
Lindner, G., Studer, R.: Ast: Support for algorithm selection with a CBR approach. In: Principles of Data Mining and Knowledge Discovery, pp. 418–423 (1999)
Liu, T., Ting, K. Ming, Zhou, Z.: Isolation forest. In: ICDM, pp. 413–422 (2008)
Lobo, J., Jiménez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Global Ecol. Biogeogr. 17(2), 9008 (2008)
Article Google Scholar
Manzoor, E., Lamba, H., Akoglu, L.: Xstream: Outlier detection in feature-evolving data streams KDD. (2018)
Na, Gyoung S., Kim, Donghyun, Yu., Hwanjo: Dilof: Effective and memory efficient local outlier detection in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1993–2002 (2018)
Orair, G., Teixeira, C., Meira, W., Wang, Y., Parthasarathy, S.: Distance-based outlier detection: consolidation and renewed bearing. PVLDB 3(2), 788 (2010)
Google Scholar
Pang, G., Shen, C., Cao, L., Hengel, A.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2), 89 (2021)
Google Scholar
Pevný, T.: Loda: lightweight on-line detector of anomalies. Mach. Learn. 102(2), 116 (2016)
Article MathSciNet MATH Google Scholar
Qin, X., Cao, L., Rundensteiner, E.A., Madden, S.: Scalable kernel density estimation-based local outlier detection over large data streams. In: EDBT (2019)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets, pp. 427–438 (2000)
Rastrigin, L.A.: The convergence of the random search method in the extremal control of a many parameter system. Autom. Remote Control 4, 1337–1342 (1963)
Google Scholar
Rogers, J., Gunn, S.: Identifying feature relevance using a random forest. In: SLSFS, pp. 173–184, Bohinj, Slovenia (2005)
Roy, S.N.: On a Heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 6, 220–238 (1953)
Article MathSciNet MATH Google Scholar
Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. SIGKDD Explor. Newsl. 15(1), 78 (2014)
Article Google Scholar
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than ROC when evaluating binary classifiers on imbalanced datasets. PLoS One 10(3), 708 (2015)
Article Google Scholar
Sathe, S., Aggarwal, C.: Subspace histograms for outlier detection in linear time. KAIS 56(3), 68 (2018)
Google Scholar
Silva, J., Faria, E., Barros, R., Hruschka, E., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 9114 (2013)
Article MATH Google Scholar
Somol, P., Grim, J., Filip, J., Pudil, P.: On stopping rules in dependency-aware feature ranking. In: CIARP (2013)
Tan, C., Ting, M., Liu, T.: Fast anomaly detection for streaming data. In: IJCAI (2011)
Tatbul, N., Lee, T.J., Zdonik, S., Alam, M., Gottschlich, J.: Precision and recall for time series. In: NIPS (2018)
Ting, K.M., Washio, T., Wells, J.R., Aryal, S.: Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach. Learn. 5, 55–91 (2017)
Article MathSciNet MATH Google Scholar
Tran, L., Fan, L., Shahabi, C.: Distance-based outlier detection in data streams. PVLDB 9(12), 96 (2016)
Google Scholar
Tran, L., Mun, M., Shahabi, C.: Real-time distance-based outlier detection in data streams. PVLDB 14(2), 7006 (2020)
Google Scholar
van Stein, B., van Leeuwen, M., Bäck, T.: Local subspace-based outlier detection using global neighbourhoods. CoRR, arxiv:1611.00183 (2016)
Vanschoren, J.: Meta-Learning, pp. 35–61 (2019)
Vanschoren, J., van Rijn, J., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 96 (2013)
Google Scholar
Wang, H., Bah, J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 998 (2019)
Google Scholar
Wu, R., Keogh, E.: Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. In: IEEE TKDE (2021)
Xia, S., Xiong, Z., Luo, Y., WeiXu, Z.G.: Effectiveness of the Euclidean distance in high dimensional spaces. Optik 4, 5614–5619 (2015)
Yang, J., Rahardja, S., Fränti, P.: Outlier detection: how to threshold outlier scores? In: AIIPCC (2019)
Yoon, S., Lee, J., Lee, B.: Ultrafast local outlier detection from a data stream with stationary region skipping. In: KDD (2020)
Yoon, S., Lee, J., Lee, B.S.: Nets: extremely fast outlier detection from a data stream via set-based processing. PVLDB 12(11), 998 (2019)
Google Scholar
Zhang, E., Zhang, Y.I.: Average precision. In: Encyclopedia of Database Systems (2009)
Zhao, Y., Rossi, A., Akoglu, L.: Automating outlier detection via meta-learning. CoRR 2009, 10606 (2020)
Google Scholar
Zimek, A., Filzmoser, P.: There and back again: outlier detection between statistical reasoning and data mining algorithms. Int. Rev. Data Min. Knowl. Discov. 8(6), 66 (2018)
Google Scholar
Zimek, A., Gaudet, M., Campello, R., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: KDD (2013)
Zimek, A., Schubert, E., Kriegel, H.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Mini. 5(5), 997 (2012)

Download references

Acknowledgements

The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) /ERC Grant Agreement n. 617393. The research work was also supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: 1941).

Author information

Authors and Affiliations

University of Crete, Voutes Campus, Heraklion, 70013, Greece
Antonios Ntroumpogiannis, Nikolaos Myrtakis & Ioannis Tsamardinos
University of Clermont Auvergne, 49 Bd François Mitterrand, 63000, Clermont-Ferrand, France
Michail Giannoulis
ENSEA, ETIS, 6 Av. du Ponceau, 95000, Cergy, France
Nikolaos Myrtakis & Vassilis Christophides
SAP, 35 Rue d’Alsace, 92300, Levallois-Perret, France
Eric Simon

Authors

Antonios Ntroumpogiannis
View author publications
You can also search for this author in PubMed Google Scholar
Michail Giannoulis
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos Myrtakis
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis Christophides
View author publications
You can also search for this author in PubMed Google Scholar
Eric Simon
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Tsamardinos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vassilis Christophides.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Appendix 1: Offline detectors

Based on the findings of previous benchmarking efforts [15, 24, 25] we considered representative offline detectors from three families, namely, distance-based like the weighed $\hbox {KNN}_W$ [51], density-based like LOF [14] and tree-based like IF [43] and OCRF [29].

1.1 Weighted K nearest neighbor ($\hbox {KNN}_{W}$)

$\hbox {KNN}_{W}$ is a score-based variation [27] of the distance-based K nearest neighbors (KNN) [51] classifier. An anomaly is a sample that is in substantially higher distance than the rest samples (i.e., they have few neighbors in close distance).

To assign a real-valued score for each sample, it computes a matrix of distances, called MD: each row represents a sample and each column represents the distance from another sample. Then, using MD it computes the score of a sample p as the maximum distance dist over its K (columns) nearest neighbors as follows:

$$\begin{aligned} score(p) = max(dist(p, q)) : 1 \le q \le K \end{aligned}$$

(A1)

Higher score indicates more abnormality. The number of nearest neighbors K and the distance metric dist($\cdot $) are the two hyper-parameters of $\hbox {KNN}_W$. Its effectiveness depends on how carefully K is chosen w.r.t. the data characteristics. For a dataset of size N, $\hbox {KNN}_{W}$ needs quadratic time $O(N^2)$ to construct MD and linear time O(NK) to score each sample.

1.2 Local outlier factor (LOF)

LOF [14] is a density-based detector which computes the local density deviation of a sample w.r.t. its neighbors. Essentially, LOF considers an anomaly any sample that lies on a sparse area while its nearest neighbors lie on dense areas.

LOF first computes the reachability distance rd between a given sample p and another sample o according to Eq. A2, where dist(p, o) is the direct distance between the two samples and k-dist(p,o) is the distance of o to its k-th nearest neighbor.

$$\begin{aligned} rd(p, o) = max(dist(p, o), \text {k-}dist(o)) \end{aligned}$$

(A2)

Then, LOF computes the local reachability density of a sample p as the inverse average reachability distance of p from its k-nearest neighbors KNN(p):

$$\begin{aligned} lrd(p) = \frac{1}{\frac{1}{\vert KNN(p) \vert } \sum _{o \in KNN(p) } rd(p, o) }. \end{aligned}$$

(A3)

The final score of a sample p is computed by comparing to the average local reachability density of its neighbors:

$$\begin{aligned} LOF(p) = \frac{1}{\vert KNN(p) \vert } \sum _{o \in KNN(o) } \frac{ lrd(o) }{ lrd(p) } \end{aligned}$$

(A4)

The number of nearest neighbors K and the distance metric $ dist (\cdot )$ are the two hyper-parameters of LOF. The effectiveness of the algorithm strongly depends on the careful selection of K. LOF needs quadratic time $O(N^2)$ to compute the distances between each pair of samples and linear time O(N) to score samples, where N is the dataset size.

1.3 Isolation forest (IF)

IF is a tree-based ensemble detector [43], which relies on the number of partitions needed to isolate a sample from the rest in a dataset. The less partitions needed to isolate, the more likely a sample is to be an anomaly.

The algorithm uses a forest of random Trees built on samples of the dataset using bootstrapping of size Max Samples. For each sub-sample, it constructs an isolation tree by uniformly selecting features and their split values as internal nodes. Samples are stored in the leafs, and the actual height of the tree could be limited up to a Max Height. The length of the path from the tree root to a leaf node, measures how well a sample in that node is isolated from the others: short paths provide evidence of anomalies. The outlyingness score of a sample p is then computed by averaging its path length over all Isolation Trees in the forest as follows:

$$\begin{aligned} score(p, n) = 2^{- E(!t(p))/c(n)} \end{aligned}$$

(A5)

where n is the max samples size, E(!t(p)) is the average of the actual height !t(p)^{Footnote 25} and c(n) is used for normalization:

$$\begin{aligned} c(n) = 2 (ln(n - 1) + 0.5772156649) - 2 (n - 1) / n \end{aligned}$$

(A6)

The closer to 1 the score is, the more probable the sample to be an anomaly. A score much smaller than 0.5 indicates an inlier.

The number of Trees t, the number of samples a tree may contain Max Samples m, and its Max Height !t are the three hyper-parameters of IF. Max Height is typically set to $\lceil log(n) \rceil )$ while the variance of scores is usually minimized with 100 random Trees. By avoiding costly distance or density calculation, IF requires linear time O(tmh) to build a forest and linear time O(tnh) to score n samples.

1.4 One-class random forest (OCRF)

OCRF is a tree-based ensemble detector [29], extension of the Random Forest. It relies on the adaption of two-class splitting criterion to one-class setting, by assuming an adaptive distribution of anomalies in each node, i.e., assuming the number of anomalies are equal to the number of normal samples in each node. After splitting, one child node captures the maximum amount of samples in a minimal space, while the other the exact opposite. The closer the leaf containing a sample is to the root of the tree, the more likely this sample is to be an anomaly.

Similarly to IF, it builds a forest of Trees built on samples and features of the dataset using bootstrapping of size Max Samples and Max Features, respectively. For each sub-sample, it constructs a tree by selecting the feature and split value that minimizes the one-class Gini improvement proxy. One-class Gini improvement proxy is given by the following formula:

$$\begin{aligned} I_G^{OC-ad}(t_L, t_R) = \frac{n_{t_L} \gamma n_t \lambda _L}{n_t{_L} + \gamma n_t \lambda _L} + \frac{n_{t_R} \gamma n_t \lambda _R}{n_t{_R} + \gamma n_t \lambda _R} \end{aligned}$$

(A7)

where $n_t, n_t{_L}$ and $n_t{_R}$ are the number of samples on the node t and its children nodes (L, R), respectively, $\lambda _L = Leb(X_{t_L}/ Leb(X_t))$ and $\lambda _R = Leb(X_{t_R}/ Leb(X_t))$ are the volume proportions of the children nodes and $\gamma $ is a parameter which influences the optimal splits.

Table 9 The values of the varying and fixed hyper-parameters

Full size table

The samples are stored in the leafs, and the actual height of the tree could be limited up to a Max Height. The length of the path from the tree root to a leaf node, measures how well a sample in that node is isolated from the others: short paths provide evidence of anomalies. The anomalousness score of a sample p is then computed by averaging its path length over all trees in the forest as follows:

$$\begin{aligned} log_2 score(x) = - \left( \sum _{t leaves}{\mathbbm {1}_{\{x\epsilon t\}}d_t + c(n_t)}\right) / c(n) \end{aligned}$$

(A8)

where $d_t$ is the depth of node t, and $c(n) = 2H(n-1)-2(n-1)/n$, !t(i) being the harmonic number.

The closer to 1 the score is, the more probable the sample to be an anomaly. A score much smaller than 0.5 indicates an inlier.

The number of Trees t, the number of samples a tree may contain Max Samples m, the number of features for those samples Max Features f and its Max Height !t are the three hyper-parameters of OCRF. Similarly to IF, Max Height is typically set to $\lceil log (n) \rceil $ while the variance scores is usually minimized with 100 random Trees. OCRF requires linear time O(tmfh) to build a forest and linear time O(tnh) to score n samples.

Appendix 2: Hyper-parameters

We are interested in assessing the effectiveness of a detector under optimal conditions. In this respect, we distinguish between hyper-parameters whose values can be set independently of the datasets and those for which we need to estimate their values per dataset.^{Footnote 26}

Table 9 presents the fixed values of various hyper-parameters we set in our experiments (e.g., training/testing splitting of datasets, window size and slide) as well as the hyper-parameters of detectors along with the range of values we tested per dataset using random search^{Footnote 27} (RS) [52]. The employed ranges empirically ensure the existence of at least an optimal value per detector and dataset. To respect the isolation requirement [34, 39], the optimization phase should take place on an additional validation partition, with samples of the training partition that remain unseen by testing. According to Tables 10–14, the optimal hyper-parameters of each detector per dataset are different in most cases to the default values originally proposed.

Table 10 Optimal hyper-parameter values of each online detector per dataset after tuning

Full size table

Table 11 Optimal hyper-parameter values of online detectors HST/F, RRCF and RS-Hash per Exathlon dataset after tuning (only if the AUC ROC score of the detectors is greater than 0.5)

Full size table

We are additionally interested in assessing how sensitive detectors are to the tuning of their hyper-parameters. In this respect, we are computing the average coefficient of variation^{Footnote 28} of MAP scores of the models tuned with different hyper-parameters across all datasets. As we can see in Fig. 17, for most online detectors (besides MCOD, LEAP and RS-HASH) the average coefficient is below or close to 0.10. Recall that the tinny performance differences between models are mostly due to the non-deterministic nature of the detectors. On the other hand, offline detectors’ average coefficient of variation lies on average around 0.2 and above, revealing that no matter their anomalousness criteria, they are a lot more sensitive to their hyper-parameters tuning. As we can see from Table 10, distance-based detectors’ hyper-parameters, differ in every dataset, which reveals the need for good tuning according to the characteristics of each dataset (value range, average distance, etc.). An interesting observation is that MCOD and CPOD succeeds to have a smaller average coefficient of variation w.r.t. offline detectors (while LEAP is also close). This is attributed to the fact that they exhibit a similar performance to the Random Classifier in most configurations where the two hyper-parameter values are not close to the optimal ones that to a lower coefficient of variation than expected. On the contrary, tuning does not significantly improve the effectiveness of tree-based and projection-based online detectors.

Regarding hyper-parameters with fixed values across all datasets, we are finally interested in investigating the impact of the window size on detectors’ effectiveness. To this end, we tested our detectors on four representative datasets w.r.t. the number of samples/features: Isolet (many samples/features), Wilt (many samples/few features), InternetAds (few samples/many features) and Diabetes (few samples/features). Figure 16 depicts the median ratio of the MAP scores with windows size (ws) = 128 as baseline that we use in our experiments. In terms of the window type, we used sliding windows on RRCF, MCOD, LEAP, CPOD, STARE and RS-HASH and tumbling windows on the remaining detectors. According to our experiments, we found that the previously mentioned detectors are performing better on sliding windows compared to tumbling. Note that the window slide indicates the number of past points to forget from the models built. HST exhibits a slightly better performance on ws = 128 especially on the datasets with few samples. L-S has a major performance boost using ws = 128 on datasets with low samples, while its performance remains stable on the rest. RRCF performs better on low sample datasets with a window size of 64, while HSTF has a performance boost on datasets with many features using ws = 256. RS-HASH is performing better using ws = 128 on all cases besides on datasets with many samples and few features. CPOD, LEAP and STARE exhibit better effectiveness using ws = 128 in all cases. X-S is stable throughout all window sizes.

Table 12 Offline detectors’ number of wins (using AP Scores) and average AP difference from the winner per dataset

Full size table

Table 13 Offline detectors’ number of wins (using AUC Scores) and average AUC ROC difference from the winner per datase

Full size table

Table 14 Optimal hyper-parameter values of each offline detector per dataset after tuning

Full size table

To ensure a fair comparison of detectors in the remaining of our work, we avoid varying the window size per dataset. In this respect, we set the windows size to 128 as it proves to be optimal in most pairs of datasets and detectors. Moreover, we did not choose 256 (or greater) window size, because we need to ensure at least one full window for testing after the training phase; note that the dataset with the lowest number of samples in our benchmark is Ionosphere with 351 samples.

Appendix 3: AUC/MAP of detectors

For the shake of completeness, we are also investigating the performance ranking of online detectors. Table 12 depicts the total number of wins of offline detectors using MAP. X-B and LOF outperform the remaining detectors in 7 datasets, respectively. This places LOF the best performing proximity-based detector and X-B the best ensemble-based detector. KNN is best performing in 3 datasets along with OCRF and IF. The worst performing detector is L-B, managing to lead only in 1 dataset. It is worth mentioning that KNN has the lowest average difference from the leader along with LOF at 9.8%. There are no major changes in Table 13, which illustrates the number of wins using AUC ROC. LOF achieves to get more wins in AUC ROC, being the best detector, leading in 9 out of 24 datasets compared to X-B getting 2 less wins than before. OCRF dropped to just one win, as well as having the highest average difference from leader. It is worth mentioning that OCRF does not manage to make any split in some datasets, due to the high range of features, which leads to an infinite volume ($> 10^{308}$).

Table 15 (M)AP Scores of online and offline detectors per dataset

Full size table

Table 16 AUC Scores of online and offline detectors per dataset

Full size table

As we did on the previous section (Sect. 4.2) we use the nonparametric Friedman test [23], in order determine if there are any significant difference between the average ranks of the detectors. With a $p value < 0.000$ we reject the null hypothesis with a confidence level of 5% that all detectors’ performances are the same. Subsequently, we use the post hoc Nemenyi test, in order to compare the detectors in pairs. There is a significant difference when the difference between the average ranks of two detectors is higher than a critical distance (CD) of 1.539 (for 6 detectors on 25 datasets at a significance level a = 0.05) Fig. 18a illustrates the ranking of detectors w.r.t. MAP scores. KNN and X-B are both ranked first (rank tie) followed closely by LOF. KNN is ranked higher than LOF, while having almost half of the wins, due to its low average difference from leader. We observe that there is a statistically significant difference between the first three detectors (LOF,KNN,X-B) and both OCRF and L-B which come last. As we can see in Fig. 18b, there are no drastic changes on ranking when using AUC ROC. LOF is first as expected from the previous results (9/24 wins) followed by KNN and X-B. There is a statistically significant difference between the two proximity-based detectors (LOF, KNN) and both L-B and OCRF, while IF and X-B have a statistical significant difference only with OCRF.

In overall, we observe that the two proximity-based detectors (LOF, KNN) ,as well as X-B, outperform all other detectors, with L-B and OCRF exhibiting the worst performance. LOF is performing better when using AUC ROC, while the rest of the detectors remain stable w.r.t both metrics.

Appendix 4: Meta-features

Table 17 Statistically significant correlations between the ratio of X-S divided by the respective online detector and meta-features. We report only pairs that had a statistically significant correlation at a significance level of 0.05

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ntroumpogiannis, A., Giannoulis, M., Myrtakis, N. et al. A meta-level analysis of online anomaly detectors. The VLDB Journal 32, 845–886 (2023). https://doi.org/10.1007/s00778-022-00773-x

Download citation

Received: 31 December 2021
Revised: 29 November 2022
Accepted: 04 December 2022
Published: 14 January 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00778-022-00773-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A meta-level analysis of online anomaly detectors

Abstract

Access this article

Similar content being viewed by others

Big Data Analytics: A Literature Review Paper

Learning from imbalanced data: open challenges and future directions

Stratified random sampling from streaming and stored data

Notes

References

Acknowledgements